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CHAPTER  1 


INTRODUCTORY  REMARKS 

I. 1.  Introduction 

The  problem  considered  in  this  paper  is  the  estimation  of  the 
parameters  in  the  mixed  model  of  the  analysis  of  variance  by  the 
method  of  maximum  likelihood,  assuming  normality  of  the  random 
effects  and  errors.  Both  asymptotic  properties  of  such  estimators 
as  the  size  of  the  design  increases  and  numerical  methods  for  their 
calculation  are  considered.  The  mixed  model  has  been  studied  for 
many  years.  Various  methods  have  been  suggested  for  use  in  specific 
cases,  but  without  unified  theory  applying  to  all  cases.  The  method 
of  maximum  likelihood,  which  provides  such  a  unified  theory,  has  not 
been  used  in  the  past  because  the  complexity  of  the  likelihood 
equations  in  general  cases  made  their  solution  very  difficult. 

Computers  have  now  made  feasible  the  solution  of  the  likelihood 
equations;  this  fact  makes  the  study  of  the  properties  of  these 
estimators  of  interest. 

This  paper  extends  certain  asymptotic  results  of  H.  0.  Hartley  and 

J.  N.  K.  Rao  (1967)  and  Whitby  (1971)  to  cover  the  asymptotic  behavior 
of  these  estimators  in  great  generality.  Both  of  these  previous  sets 
of  results  have  restrictions  confining  their  application  to  a 
narrower  class  of  models  to  which  many  interesting  cases  do  not 
belong.  In  this  paper  the  theory  has  been  extended  to  cover  almost 
all  models.  (Certain  degenerate  cases  are  not  considered.)  The 
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theory  presented  here  applies  to  both  balanced  and  unbalanced  designs  . 
The  exact  model  used  is  given  by  Hartley  and  Rao  as 


y  =  Ya  +  U  b  +  U  h  +...+  U  b  +  e  , 

**  —  -A— 1  -2-2  — p^-p^  —  5 

where  jr  is  an  nxl  vector  of  observations;  X  is  an  nap^  design  matrix 

for  the  p.xl  vector  of  fixed  effects  a;  U.  is  an  nxm.  design  matrix 
0  -a  i  ° 

for  the  m.xl  vector  of  random  effects  b. ,  i=l,2,...,pn  ;  e  is  an  nxl 
l  —a’  ’  ’  ,Sfl’  — 

random  vector  of  errors.  It  is  assumed  that  the  expected  value  of 
each  of  the  random  vectors  is  the  zero  vector  and  that  the  covariance 
matrix  of  b.  is  a. I,  i=l,2, . . . .p. ,  and  the  covariance  matrix  of  e  is 

'"VI  7  l7  'v 

OqI,  where  the  identity  matrices  are  of  appropriate  size.  (See  the 

-  2 
note  in  Section  1.3  on  the  use  of  a.  instead  of  a.  for  a  variance.) 

li 

It  is  further  assumed  that  b,  ,b_ , . . .  ,b  ,  and  _e  are  mutually  independ- 

— u  — <'d?i  ~ 

ent  and  that  each  has  a  multivariate  normal  distribution.  It  follows 
that  y  has  a  multivariate  normal  distribution  with  mean  Xa  and  covariance 


matrix  S  s  oul  +  CT-thU7  +  0oUoul  +...+  a  U  U7 
—  O-  1-1— 1  2-2-2  p1-p1-p1 


A  full  description 


of  the  model  and  the  basic  assumptions  about  it  is  given  in  Section  1.3* 
The  problem  is  to  estimate  o  and  CTq,o^, . . . ,0^  by  the  method  of  maximum 

likelihood  and  to  study  the  properties  of  such  estimators. 

Consistency  and  asymptotic  normality  of  maximum  likelihood  esti¬ 
mators  are  often  proved  by  using  a  Taylor  series  expansion  of  the 


1.  The  author  acknowledges  his  indebtedness  to  Hartley  and  Rao  for 

the  form  of  the  basic  model  and  to  Whitby  for  the  form  of  Theorems 
3.2.1  and  3-3.1. 


3 


likelihood  equations  about  the  true  parameter  point  (Cram4r  [1946]). 
There  are  often  independent,  identically  distributed  observations 
and,  in  this  case,  conditions  are  placed  on  the  common  density  function 
to  allow  the  correct  expansions  to  be  made.  The  asymptotic  theory  is 
then  carried  out  by  normalizing  by  seme  sequence  depending  on  the 
number  of  observations.  (Usually,  but  not  always,  the  square  root  is 
used;  see  Whitby  [1971:2].)  Work  has  also  been  done_with  cases  where 
the  observations  are  not  independent,  not  identically  distributed,  or 
neither  independent  nor  identically  distributed. (For  example,  see 
Silvey  [1961].)  However,  none  of  the  above  theory  can  be  directly 
applied  to  this  problem.  One  generally  does  not  think  of  an  analysis 
of  variance  as  a  sequence  of  observations  but  as  an  experiment;  that 
is,  one  thinks  of  an  analysis  of  variance  as  one  observation  of  a 
vector  of  variables  constituting  an  experiment.  Asymptotic  theory  is 
then  usually  carried  out  on  a  "conceptual  sequence  of  experiments”  for 
which  the  size  of  the  entire  design  becomes  infinite  in  some  orderly 
way.  The  work  done  in  this  area  of  the  analysis  of  variance  does  not 
apply  in  all  cases  either.  The  problem  is  that  often  the  estimate  of 
each  parameter  requires  a  different  normalizing  sequence;  this  problem 
is  discussed  further  below. 

The  results  presented  here  are  extensions  of  the  work  of  Whitby 
in  the  following  sense.  Whitby  proved  theorems  dealing  with  a  general 
maximum  likelihood  estimation  problem.  He  considered  estimation  of 
several  parameters  but  his  proofs  depend  on  the  normalizing  sequence 
(he  calls  it  c  )  being  the  same  for  each  parameter  and  on  the  norm  of 


a  pxl  vector  x  being  ||xjj  =  x^2  •  He  mentioned  that  different 

normalizing  sequences  may  be  necessary  but  gave  no  indication  of  how 
they  are  obtained  or  why  they  might  be  necessary.  The  theorems 
presented  here  are  much  more  general.  They  allow  any  legitimate 
vector  norm  to  be  used  and  they  allow  the  estimate  of  each  parameter 
to  have  its  own  normalizing  sequence.  To  see  that  such  an  extension 
may  be  necessary  in  an  analysis  of  variance,  one  need  only  consider . 
the  balanced  two-way  model.  The  sufficient  statistics  in  this  model 
are  the  grand  mean  and  several  sums  of  squares.  The  sums  of  squares 
are  a  set  of  independent  chi-square  random  variables  with  degrees  of 
freedom  which  will  increase  at  different  rates  (See  Sections  6.1  and 
6.2.);  any  analysis  of  the  maximum  likelihood  estimators  (which  of 
course  are  functions  of  the  sufficient  statistics)  must  take  account 
of  these  differences. 

The  freedom  to  allow  the  estimate  of  each  parameter  to  have  its 
own  normalizing  sequence  is  achieved  by  the  artifice  of  building  the 
normalizing  sequence  into  the  parameter,  obtaining  a  set  of  sequences 
of  parameters.-  The  basic  asymptotic  theorem.  Theorem  3.3.1>  deals 
only  with  such  sequences  of  parameters.  It  is  quite  general;  in  fact, 
most  of  the  usual  asymptotic  results  about  maximum  likelihood  estima¬ 
tion  can  be  derived  as  special  cases  of  Theorem  3.3.1.  In  Chapter  4 
Theorem  3.3.1  is  used  to  prove  the  consistency  and  asymptotic  norm¬ 
ality  of  the  maximum  likelihood  estimators  in  the  mixed  model  of  the 
analysis  of  variance.  This  is  done  by  translating  back  from  the 
properties  of  estimators  of  a  set  of  sequences  of  parameters  to  a 
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sequence  of  estimates  of  a  set  of  parameters. 

One  advantage  of  the  use  of  the  above  method  of  proof  for  Theorem 
3.3.1  and  its  application  in  Theorem  4.4.1  is  that  the  problems  incurred 
when  the  wrong  normalizing  sequence  is  used  are  clearly  located.  These 
problems  are  easily  illustrated  by  considering  the  simple  case  of  cor¬ 
rectly  normalizing  X  ,  the  arithmetic  mean  of  n  independent,  identically 

distributed  random  variables,  each  with  mean  zero  and  finite  nonzero 

i+e  _ 

variance.  If  the  normalized  estimate  is  Y-  =  n2  X  ,  then  only  for 

n  n?  J 

e=0  will  a  limiting  normal  distribution  be  obtained.  If  e  <  0  the 
normalizing  sequence  is  "too  small"  and  the  converge  to  a  degenerate 
(point)  distribution  at  zero.  If  e  >  0  the  normalizing  sequence  is 
"too  large"  and  the  distributions  of  Y  "blow  up"  to  a  distribution 
having  atoms  at  plus  and  minus  infinity.  This  phenomenon  has  some¬ 
times  been  described  in  the  following  manner:  if  e  <  0  the  asymptotic 
variance  is  zero  and  if  e  >  0  the  asymptotic  variance  is  Infinite. 

While  the  first  descriptions  of  the  phenomenon  are  technically  more 
accurate,  the  second  descriptions  do  point  out  a  way  of  locating  the 
problem.  In  the  analysis  of  variance  model  the  problems  manifest 
themselves  in  the  matrix  J  defined  in  Section  4.3. 

The  matrix  J  is  the  limit  of  the  matrix  of  expected  values  of 
second  derivatives  of  the  log- likelihood.  This  matrix  has  had  the 
normalizing  sequences  built  into  it.  Only  if  the  normalizing  sequence 
for  each  parameter  is  of  precisely  the  right  order  of  magnitude  will 
J  be  positive  definite.  If  the  sequence  for  some  parameter  is  "too 

\ 
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small"  the  limit  of  the  appropriate  diagonal  element  -will  he  infinite. 

If  the  sequence  is  "too  large"  the  entire  row  and  column  of  J  associated 
tilth  that  parameter  will  he  zero.  Since  J  ^  is 'the  asymptotic  covariance 
matrix  it  is  easily  seen  that  results  analagous  to  the  case  for  X  occur 
in  the  model  under  study.  The  problems  of  zero  or  infinite  "asymptotic 
variances"  manifest  themselves  in  the  nonexistence  of  the  limits  form¬ 
ing  J  or  the  fact  that  J  is  not  positive  definite.  However,  the  proofs 
of  Theorems  3.3.1  and  4.4.1  point  out  further  what  goes  wrong.  The 
proof  of  Theorem  3*3.1  breaks  down  completely  if  J  is  not  positive 
definite.  Proofs  of  the  lemmae  used  to  prove  Theorem  4.4.1  also  break 
down  if  the  normalizing  sequences  are  not  of  the  correct  orders  of 
magnitude.  The  assumptions  of  Section  4.2  insure  that  for  the  sequence 
of  designs  considered  in  Theorem  4.4.1  the  matrix  J  will  be  positive 
definite. 

The  results  presented  here  are  also  extension  of  the  work  of 
H.  0.  Hartley  and  J.  H.  K.  Rao.  Hartley  and  Rao  (1967:101)  make  the 
following  assumption  about  the  asymptotic  behavior  of  the  design 
matrices  U. :  Every  column  of  each  U.  may  contain  at  most  a  finite 
number  of  nonzero  elements.  The  two-way  balanced  model  mentioned 
above  illustrates  that  this  assumption  rules  out  any  sort  of  balanced- 
crossed  layout.  (See  Section  6.1.)  The  effect  of  this  assumption  is, 
in  fact,  to  assure  that  the  estimates  of  all  the  cr.  can  be  normalized 
by  the  same  normalizing  sequence.  As  noted  above,  the  results  presented 
here  are  not  so  restrictive.  In  fact,  any  sequence  of  designs  that 
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might  be  considered  as  an  actual  sequence  of  experiments  is  covered  by 
these  results;  the  assumptions  used  here  may  appear  restrictive  but 
they  merely  rule  out  cases  that  are  useful  only  as  counterexamples  to 
theorems  and  not  as  actual  design  sequences. 

Hartley  and  Rao  went  on  in  their  paper  to  assert  that  under 
their  assumptions,  the  maximum  likelihood  estimators  are  consistent 
and  asymptotically  efficient,  although  they  did  not  make  clear  what 
they  meant  by  the  latter  term.  They  sketched  but  did  not  give  details 
of  a  method  of  proof.  In  fact  the  details  constitute  the  difficult 
part  of  the  proof.  With  a  great  deal  of  effort,  and  after  corrections 
are  made  to  their  assumptions,  their  method  of  proof  (with  details) 
yields  the  claimed  results,  (if  the  assumptions  are  taken, aswritten, 
counterexamples  to  the  theorems  can  be  found. )  A  more  detailed  dis¬ 
cussion  of  their  proof  is  given  In  Appendix  D.  The  methods  used  here 
require  no  less  effort  but  cover  all  interesting  cases. 

Thus  far  only  asymptotic  theory  concerning  the  maximum  likelihood 
estimates  has  been  discussed.  The  numerical  computation  of  the  esti¬ 
mates  is  also  a  fruitful  area  for  study.  One  problem  in  this  case  is 
the  problem  of  negative  estimates.  The  maximum  likelihood  estimate  of 
a  variance  can  never  be  a  negative  number;  such  a  point  would  not  be  in 
the  parameter  space  and  would  be  ineligible  for  a  maximum  likelihood 
estimate.  Thus  some  sort  of  truncation  procedure  must  be  used  to 
insure  nonnegative  estimates.  The  procedure  used  when  the  estimates 
are  obtained  as  solutions  to  the  likelihood  equations  is  given  in 

I 

\ 

t 


truncation  ceases  to  be  a  problem.  Asymptotic  theory  ‘when  the  true 

4-gfc 

parameter  point  is  on  the  boundary  has  not  been  developed  for  this 

•  ,|f 

model:  in  some  simple  boundary  cases,  asymptotic  distributions  occur  ■> 


■which  are  definitely  not  normal.  Thus  some  further  extension  of 
methods  of  proof  >7111  be  required  in  these  cases. 

The  actual  numerical  procedures  for  the  computation  of  the  estimates 
are  discussed  in  Chapter  5.  An  iterative  procedure  suggested  by  Anderson 
(1971k) V"  (1973)  is  compared  with  a  procedure  suggested  by  H.  0.  Hartley 
and  J.  N.  K.  Rao  (1967)  smcL  implemented  by  Vaughn  (1970) .  The  pro¬ 
cedure  of  Anderson  was  found  to  be  computationally  much  more  efficient 


than  the  Hartley,  Rao,  Vaughn  procedure  in  a  Monte  Carlo  study.  J.  H.  K. 
Rao  (1973)  has  pointed  out  that  the  Anderson  procedure  is  in  effect  the 
method  of  scoring.  A  computer  program  developed  to  implement  the 
Anderson  procedure  is  discussed  in  Appendix  C. 


1.2.  Notation  and  Conventions  Used 


The  following  notation  will  be  used  throughout  this  paper.  All 
vectors  are  column  vectors  and  are  underscored  with  the  symbol  to 
represent  boldface  type.  Matrices  are  also  underscored  in  the  same 
manner.  With  three  exceptions  (G,  R,  and  sometime  Y)  vectors  are 
represented  by  small  Latin  or  Greek  letters;  matrices  are  always 
represented  by  capital  Latin  or  Greek  letters.  All  vectors  belong  to 
the  space  Rm  (the  Cartesian  product  of  R,  the  real  line,  with  itself 
m  times)  of  the  appropriate  dimension.  A  vector  norm  is  denoted  ||»|| 
and  the  matrix  norm  it  induces  as  a  natural  norm  is  also  denoted  by 


1 1  Ax]  | 

|* ||;  i.e.  ,  ||a)|  =  sup  — f|^J~  *  If  A  is  an  nxn  matrix,  tr(A)  =  a_ 


is  the  trace  of  A,  X.(A)  is  the  j  characteristic  root  of  A,  -where 
X,  2  X„  2  ...  2  X  ,  a|  is  the  determinant  of  A,  and  if  A  is  non- 

singular  A  ^  is  the  inverse  of  A  and  A  ^  =  (A7)  ^  =  (A 

If  f^x)  is  a  pxl  vector  function  of  the  pxl  vector  x,  then  the 
Jacobian  of  jf,  J^(x)  is  a  pxp  matrix  function  of  x  defined  by 
d  [  f <(x)  ] . 

[^f(~)]io s  — Sr —  •  xt  then  follows  if  £(*)  =  £  &(*)> 

~  3 


where  f_  and  ^  are  pxl  vector  functions  of  the  pxl  vector  x  and  A  is 
pxp  nonsingular,  J^, 
be  ,  -where  G  is  a  pxl  vector  function  of  a  pxl  variable  and  a 

random  variable  Y  (^Y  may  also  be  a  vector  ).  Analyses  will  then  be 
performed  on  G  relative  to  for  fixed  Y  and  Y  may  be  required  to 
belong  to  some  set  in  its  probability  space  (which  set  will  have  large 


A  frequently  used  notation  will 
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probability) .  For  such  a  G  the  pXp  Jacobian  matrix  Y)  is  defined 

A/ 

[^(j^jY)].  .  =  - st: -  .  Another  notation  -which  will  be  used  may 

seem  confusing  at  first,  jr^  is  often  used  as  a  statistical  parameter 
and  jr  as  an  argument  in  a  function  in  the  same  expression;  the  notation 


Mz>i) 


1  i  !b|r=iir 

a- 


may  appear,  where  X  is  a  scalar  function  of  £  and  j^  and 


y  is  a  random  variable.  This  expression  contains  jr,  the  variable  which 
is  differentiated  -with  respect  to  in  the  function  X,  and  also  jr^,  a 
parameter,  one  of  a  sequence  of  parameters  under  consideration.  This 
dual  role  of  jr^  parallels  the  use  of  dummy  variables  in  an  integration 


(e.g.  f(x)  =  j  g(x)dx  ).  Differentiation  by  a  subvector  -will  be  as 


follows:  Let  \[r /=  (p  ,t;  then  — - - 

a*  ~  ~  op .  .  . 

3 


a\(y,jr) 


'i=lu 


may  be 


written  if  that  is  more  convenient  than  using  a  complicated  subscript 
scheme.  Vector  derivatives  may  also  be  used  where  appropriate;  for 

example,  ^4  where  3  is  p  xl  yields  a  p_Xl  vector  of  derivatives  and 
op  ~  0  U 


dgdg;  yields  a  p0xp0  matrix  of  second  derivatives. 

Linear  spaces  are  denoted  by  script  letters  and  are  column  spaces; 
that  is,  £(x)  is  the  linear  space  formed  by  all  linear  combinations  of 
the  columns  of  X.  The  symbol  ®  denotes  direct  sum;  if  X  =  £(X)  and 
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y  =  £(Y)  then  p  =  X  ffi  *y.  ~  {jz  |  &=x+j;,  x  e  X,  e  .  Spheres  of  radius 
r  about  x^  are  represented  by'S^Xg)  s  |j^-x^j|  <  r}  for  -whatever 
norm  ]]»|j  is  being  used.  If  8  c  Rp,  then  S  is  the  closure  of  S. 

Convergence  in  distribution  is  represented  by—*  and  convergence 
in  probability  by-^-*  .  Both  of  these  concepts  may  be  used  with  vectors 
or  matrices.  For  instance,  "A  (Y  )  -&*  A",  where  A  (Y  )  is  a  pxp  matrix 
function  of  a  random  variable  Y  and  A  is  a  pxp  constant  matrix,  means 
Pfll*  GJ  -Ajj  >  6}  -  0  as  n  -*  ®.  Since  both  vector  and  matrix  norms  are 
continuous  functions  of  their  elements,  it  is  sufficient  to  prove  con¬ 
vergence  for  each  element  of  the  vector  or  matrix  separately. 

A  p  dimensional  multivariate  normal  random  vector  is  denoted 
7£p({^£),  where  ^  is  its  expected  value  and  £  is  its  covariance  matrix. 
For  a  -univariate  normal,  the  subscript  is  dropped.  ^  is  a  chi-square 
random  variable  with  p  degrees  of  freedom.  The  symbol  means  "dis¬ 
tributed  as";  thus  X  ~  2^  means  the  random  variable  X  has  a  chi-square 
distribution  with  p  degrees  of  freedom. 

One  practice  followed  in  this  paper  which  the  reader  might  find 
confusing  is  the  notation  of  dependence  on  n.  "n  -*  is  used  to 
denote  that  the  size  of  the  entire  design  becomes  infinite  (jr  is  nxl). 
All  other  elements  of  the  problem,  including  the  sizes  of  vectors  and 
matrices,  depend  on  n;  only  p^  and  p^,  the  number  of  parameters  in  the 
model,  remain  fixed.  The  dependence  on  n  does  not  always  appear  in 
the  notation.  It  is  suppressed  when  it  is  obvious  that  such  dependence 
exists.  When  it  is  not  suppressed,  it  is  usually  to  emphasize  the 
dependence  on  n.  The  reader,  being  forewarned,  should  not  be  disturbed 
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by  the  seeming  inconsistency;  as  one  becomes  familiar  with  the  topic, 
the  inconsistency  disappears. 

This  paper  is  divided  into  seven  chapters  labeled  1-7  and  four 
appendices  labeled  A-D.  When  a  chapter  is  divided  into  sections,  these 


sections  are  labeled  1,2,...  and  are  prefixed  by  the  chapter  label 
(e.g.  Section  1.2,  Section  A. 3).  Section  A. 4  is  further  divided  into 
five  subsections  labeled  A.4.1-A.4.5.  Theorems,  lemmae,  propositions, 
and  assumptions  are  numbered  consecutively  within  sections  and  prefixed 
by  the  section  label  (e.g.  Assumption  1.3.5,  Proposition  A. 3.4).  If 
there  is  only  one  theorem  in  a  section  it  is  still  labeled  as  the  first 
theorem  in  the  section  (e.g.  Theorem  3.2.1).  In  the  case  of  the  sub¬ 
sections  A.4.1-A.4.5,  the  lemmae  appearing  in  these  subsections  are 
labeled  A.4.1-A.4.5  instead  of  A.4.1.1-A.4.5.1  because  the  first  set 
of  numbers  relates  to  Section  A. 4  as  a  whole  and  that  is  the  rule  used 


to  number  these  lemmae 
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1.3,  Basic  Analysis  of  Variance  Model  and  Assumptions  About  It 

The  "basic  model  used  will  be  the  mixed  model  analysis  of  variance, 
which  can  be  written  as 

y  =  la  +  U..b,  +  U_b  +...+  U  b  +  e 

where 

^  is  an  nxl  vector  of  observations; 

X  is  an  nxp^  matrix  of  known  constants; 

~  o 

a  is  a  p.xl  vector  of  unknown  constants; 

~  u 

U.  is  an  nxm.  matrix  of  known  constants.  i=l,2, . . .  ,p_. ; 

1  JL 

b^  is  an  imXl  random  vector,  i=l,2, . . . ,p^; 
e  is  an  nXl  random  vector. 

..  .  ..  .. 

Thus  X  is  the  design  matrix  for  the  fixed  effects  and  the  U.  are  the 

design  matrices  for  the  random  effects  b..  Let  G.  =  U.Uf  , 

~i  ~i  ~i~-a 

i=l,2, . . .  jP^,  and  Gq  =  1^  .  The  following  assumptions  are  made  about 
the  model: 

ASSUMPTION  1.3.1.  The  random  vectors  bn,b_,...,b  ,  e  are 

- ~ 1  '>-2  <^>1  - 

mutually  independent,  with  e,  ~  7^(0 ,  crQ  1^)  and  b^  ~  7^  (0,  ck  1^  ) , 
i— 1,2, ...  ,p^. 

ASSUMPTION  1.3.2.  The  matrix  X^  has  full  rank  p^. 

ASSUMPTION  1.3.3.  n  s  pQ  +  p  +  1, 

ASSUMPTION  1.3.4.  The  partitioned  matrix  [Xtu^l  has  rank  greater 
than  Pq  5  132, ...  ,p^. 

2 

2.  This  differs  from  the  usual  convention  of  using  cm.  cm  is"  used  as 

a,  variance  to  avoid  writing  many  squares.  This  also  follows  the 
notation  of  Anderson  (1969),  (1970),  (1971b),  (1973). 
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ASSUMPTION  1.3.5.  The  matrices  G^,  G, , 

- 


are  linearly 


independent ;  that  is,  It_0  o\Ch  =  0  implies  o\  =  0,  i=0,l, . . .  ,pn . 

These  assumptions  are  sufficient  to  find  the  estimates;  however,  one 
further  assumption  about  the  U.  ■will  be  made. 


ASSUMPTION  1.3.6.  The  matrix  th  consists  only  of  zeros  and  ones 
and  there  is  exactly  one  1  in  each  row  and  at  least  one  1  in  each 
column,  i=l,2, . . . ,p^. 

The  above  assumptions  can  be  explained  in  analysis  of  variance 

terms.  Assumption  1.3.1  is  the  usual  assumption  of  the  independence 

and  normality  of  the  random  effects.  Assumption  1.3.2  can  always  be 

satisfied  by  a  suitable  reparameterization  of  the  problem.  Assumption 

1.3.3  says:;  there  are  at  least  as  many  observations  as  parameters. 

Assumption  1.3.4  says  that  the  fixed  effects  are  not  confounded  with 

any  of  the  random  effects..  Assumption  1.3.5  says  that  the  random 

effects  are  not  confounded  with  each  other.  Assumption  1.3.6  just 

says  that  the  U.  are  standard  design  matrices  and  it  has  three 

consequences.  Uf  =  D^,  an  nuxmn  nonsingular  diagonal  matrix;  th 

has  full  rank  m.:  and  m.  £  n. 

i’  i 

It  follows  from  the  above  assumptions  that  Z  has  a  normal  distri¬ 


S(Z)  =  to  , 

Coy(Z)  =  S(Z-^)(Z-^)' 


=  oul  + 
G~n 


cr,  U,  U-f  + . . .  + 


a  U  U 
P1''JP1'dP1 


but ion  with 
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Thus  y  ~  71  (Xh-E).  The  -problem  considered  here  is  to  observe  y  and 

estimate  a,  cr_,cr., , .  . .  ,cr  by  the  method  of  maximum  likelihood. 

~  0’  1’  1*1 

The  parameter  space  is  defined  as  follows:  Let  p=p^+  p^+  1. 

©  c:  R-P  is  the  parameter  space;  if  £  s  0  then  <0/=  (aj  ) ,  -where 
o'  =  (or..  jOTp, . . .  ,a  )  '  and  a  =  (cr  ,0  )/.  The  restrictions 

which  form  0  are  0  =  {9  e  R'P|G/=  (cr'.o');  a  e  RP0;  <j„  >  0;  a.  £  0, 
i=l,2, . . .  ,p^} .  If  the  likelihood  function  of  £  and  9  is  L{^r,  0)  it 
follows  from  the  multivariate  normal  density  that 

\(£,e)  =  log  L(X,6) 

=  4n (log  2n)-|-log|s|4(^-}to)'E'1(x-^). 


3 

Differentiation  of  X  by  £  and  o  leads  to  the  following  equations  which 
must  be  solved  simultaneously  for  c?  and  a: 


[V(  I1  a.G.VVIar®  X'(  E1  o.G.V1  y 


trf.E*  cr.G.yiG.=  trf  E1  o.G.V^.f  E1  a  .G  .V1(y-S)')  (y-Xh) ' , 

\J=0  ~a  \j=Q  j~j/  „  ^  > 


x — 0  jl j • . . jP^. 


These  equations  can  be  abbreviated 


[X'  E-1Xj  o-  =  X7  E_1y  , 

tr  E-1G.  =  tr  E"1G.E"1(v-X>)  (y-Xh)  'i=0,l, . . .  ,p.  , 

~  ~o.  ~  ~i~  'Zj  7  7  i 


where  E  is  a  function  of  a. 


Such  maximization  by  differentiating  is  justified  in  Anderson 

( 1970) . 


As  was  noted  in  Section  1.1,  the  maximum  likelihood  estimates 
of  the  variances  cannot  be  negative,  but  the  solutions  of  the  likeli¬ 
hood  equations  may  be  negative.  Thus  the  following  truncation  scheme 
is  necessary.  If  the  solution  of  the  likelihood  equations  yields  a 
negative  estimate  of  cm  for  i  e  S  (where  S  is  some  set  of  indices) 
then  solve  the  following  set  of  equations: 


,  i  \  S  , 

,  i  e  S  . 

If  a  negative  estimate  occurs  for  some  other  cm ,  add  its  index  to  the 
set  S  and  repeat  the  above  procedure.  Continue  in  this  manner  until 
no  negative  estimates  are  obtained.  As  is  pointed  out  in  Section  5*6 
this  is  easily  done  for  this  model. 

The  computational  methods  used  to  solve  the  likelihood  equations 
are  discussed  in  Chapter  5-  The  asymptotic  properties  of  such  solutions 
are  discussed  in  Chapter  4. 


Vk 

an 


=  0 


ax 

acr. 


=  0 


CTi 


=  0 


CHAPTER  2 


REVIEW  OF  PAST  LITERATURE 

The  subject  of  variance  component  estimation  has  been  considered 
for  some  time.  An  excellent  overview  of  the  present  state  of  the  art 
may  be  found  in  Searle  (1971) .  He  described  the  development  of  many 
different  techniques  used  and  gave  copious  references.  A  history  of 
the  development  of  the  entire  field  will  not  be  given  here;  instead  a 
short  review  of  the  literature  immediately  pertinent  to  this  paper  is 
presented. 

Theoretical  properties  of  methods  of  estimation  and  testing  of 
variance  components  have  been  considered  by  Herbach  (1959) ,  who 
considered  the  one  and  two  way  balanced  layouts  and  proved  optimality 
results  for  the  usual  analysis  of  variance  tests  ana  also  derived  the 
likelihood  ratio  tests  in  these  cases.  Graybill  and  Hultquist  (1961), 
Hultquist  and  Graybill  (1965),  and  Hultquist  and  Atzinger  (1973) 
considered  the  balanced  models  with  respect  to  minimal  sufficient 
statistics  and  proved  various  optimality  results  as  -well  as  deriving 
certain  likelihood  equations;  some  of  the  results  of  Hultquist  and 
Atzinger  overlap  some  of  those  of  Anderson  (1970)  described  below. 

Several  authors  have  considered  the  problem  of  a  model  where  the 
covariance  matrix  has  a  special  structure.  Wilks  (1946)  considered 
the  intraclass  correlation  coefficient  model.  Olkin  and  Press  (1969) 
studied  the  circular  stationary  model.  Srivastava  (1966)  and  Srivastava 
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and  Maik  (1967)  considered  a  more  general  model  -where  the  covariance 
matrix  has  linear  structure  and  the  matrices  G~  ,G_,  , . . .  ,G  have  special 

properties.  All  the  above  authors  derived  likelihood  ratio  tests 
(using  several  diverse  techniques)  for  the  particular  model  under 
study.  Anderson  (1969) ,  (1970),  (1971b),  (1973)  also  studied  models 
-where  the  covariance  matrix  has  linear  structure.  Anderson’s  method 


of  analysis  enabled  all  the  above  cases  to  be  considered  within  one 
unified  framework. 


The  model  Anderson  used  is  y  ~  7?  (ji,£)  ,  where  =  Xp  and 


X  =  Z  a.  G..  The  model  assumed  here  is  a  special  case  of  this. 
i=0 


Anderson  derived  the  likelihood  equations  and  showed  how  they  can  be 
simplified  in  certain  cases. (See  Section  5«2.)  He  gave  conditions  for 
estimability  of  the  parameters  and  suggested  several  methods  for  the 
solution  of  the  likelihood  equations.  The  method  studied  in  this  paper 
was  advanced  in  (1971b).  In  (1969),  he  derived  and  gave  properties  of 
the  likelihood  ratio  tests  of  hypotheses  about  S  and  the  cm .  He  also 
proved  that  the  maximum  likelihood  estimators  in  this  case  are  consis- 
tent  and  asymptotically  efficient  as  the  entire  process  is  replicated 
(that  is,  as  repeated  observations  are  taken  on  y)  and  he  derived  the 
asymptotic,  covariance  matrix. 

H.  0.  Hartley  and  J.  N.  K.  Rao  (1967)  analyzed  maximum  likelihood 


estimation  in  the  mixed  model  of  the  analysis  of  variance,  the  model 
used  in  this  paper.  They  gave  five  rationale  for  using  the  method  of 
maximum  likelihood  in  this  case,  -which  are  paraphrased  here. 
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a)  Computers  make  easy  solution  of  the  likelihood 
equations  possible. 

b)  This  technique  can  be  applied  to  any  model, 
balanced  or  unbalanced. 

c)  The  technique  has  large  sample  optimality  properties. 

d)  Maximum  likelihood  estimates  are  always  functions  of 
the  minimal  sufficient  statistics. 

e)  The  maximum  likelihood  estimates  of  the  variance 
components  are  always  positive. 

Several  comments  can  be  made  about  these  rationale.  A  technique 
(similar  to  Anderson’s)  is  proposed  in  this  paper  which  is  different 
than  Hartley  and  Rao’s  for  solving  the  likelihood  equations,.  _This 
technique  does  not  guarantee  nonnegative  estimates,  but  can  easily 
be  modified  so  that  only  nonnegative  estimates  are  finally  arrived 
at. (See  Chapter  5.)  (Many  writers  have  considered  the  problem  of 
negative  estimates;  see  Searle  [1971:22]  for  a  good  summary. )  Hartley 
and  Vaughn  (1972)  developed  a  computer  program  to  implement  the 
algorithm  of  Hartley  and  Rao.  (The  computer  program,  to  implement  the 
algorithm  proposed  here  is  discussed  in  Appendix  C.)  The  large  sample 
optimality  referred  to  above  was  proved  by  Hartley  and  Rao  only  for  a 
limited  set  of  designs  in  which  the  number  of  observations  at  any 
particular  level  of  any  random  factor  must  remain  bounded.  Such  an 
assumption  rules  out  even  so  simple  a  model  as  the  two-way  crossed 
layout  random  effects  model.  In  this  paper  the  optimality  results 
are  extended  to  cover  almost  all  interesting  cases. 
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The  discussion  of  asymptotic  results  has,  as  a  rule,  been 
confined  to  independent,  identically  distributed  observations.  The 
basic  techniques  were  expounded  by  Cramer  (1946),  Wald  (1949)  and 
Wolfowitz  (1949).  As  noted  in  the  introduction,  these  techniques  do 
not  apply  to  the  model  under  consideration  here.  Silvey  (1961) 
discussed  asymptotic  results  for  sequences  of  dependent  observations 
but  again  -his  work  cannot  be  applied  in  this  case.  Whitby  (1971) 
considered  estimation  for  the  generalized  beta  distribution;  although 
he  was  also  considering  sequences  of  independent,  identically  dis¬ 
tributed  random  variables,  he  did  prove  some  general  theorems  which 
have  been  extended  to  cover  the  analysis  of  variance  model  of  this 
paper.  Whitby  used  only  one  normalizing  sequence  for  all  the 
estimates.  The  theorems  presented  here  allow  a  different  normalizing 
sequence  for  the  estimate  of  each  parameter.  This  is  done  by  the 
artifice  of  building  the  normalizing  sequence  right  into  the  para¬ 
meter.  The  result  is  a  sequence  of  parameters,  the  estimates  of 
which  have  certain  properties;  these  properties  can  then  easily  be 
translated  to  properties  of  the  desired  estimators  .(See  Chapter  4.) 

This  is  only  a  cursory  review  of  the  subject  of  maximum  like¬ 
lihood  estimation  in  the  analysis  of  variance.  For  a  more  detailed 
study;  the  reader  is  referred  to  the  papers  of  Searle  (1971);  Anderson 
(1971b),  (1973);  Hartley  and  Rao  (1967);  and  Whitby  (1971). 


CHAPTER  3 


BASIC  THEOREMS  OK  ASYMPTOTIC  BEHAVIOR 

3.1.  Introduction 

In  this  chapter  two  theorems  are  presented  which  will  be  used  to 
prove  the  asymptotic  properties  of  maximum  likelihood  estimators  in 
the  analysis  of  variance.  Theorem  3.2.1  is  a  form  of  inverse  function 
theorem  and  is  used  to  prove  Theorem  3-3.1.  Theorem  3.3.1  is  a  very 
general  theorem  concerning  asymptotic  theory  and  is  used  to  prove 
Theorem  4.4.1,  the  main  asymptotic  result  of  this  paper.  However, 
where  Theorem  3.2.1  has  little  intrinsic  interest  except  for  the 
mathematics  of  its  proof,  Theorem  3.3.1  has  applications  beyond  that 
of  a  lemma  for  Theorem  4.4.1.  Theorem  3.3-1  has  wide  applicability 
and  can  be  used  to  prove  most  of  the  standard  asymptotic  results  con-, 
ceming  roots  of  the  likelihood  equation. 

Theorem  3.2.1  is  a  form  of  inverse  function  theorem.  It  concerns 
roots  of  the  vector  equation  G(x)  =  0.  When  G  has  the  form 
G(x)  =  a  +  A(x-x,.)  +  r(x) ,  a  and  r  are  small  relative  to  A,  and 
certain  continuity  and  differentiability  conditions  are  met,  then 
there  is  a  root  of  £(x)  =  £  near  x^.  Similar  theorems  are  often  used 
in  proving  results  about  the  roots  of  the  likelihood  equations.  The 
method  of  proof  of  Theorem  3.2.1  is  patterned  on  the  proof  of  The 
Inverse  Function  Theorem  (Theorem  9-17)  of  Rudin  (1964:193-195).  Tbe 
form  of  the  theorem  is  patterned  on  a  lemma  (Lemma  3.1)  of  Whitby 
( 1971’-  8-9)-  Whitby  based  his  proof  on  the  proof  in  the  first  edition 
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of  Ru&in's  book;  Rulin' s  first  proof  (and  hence  Whitby's)  required 
that  the  norm  of  a  vector  be  the  Euclidean  norm,  ||x||  =  x^2  • 

The  proof  given  here  is  slightly  more  general  in  that  it  allows  any 

vector  norm  to  be  used.  (See  Issacson  and  Keller  [1966:3-^-3  for  the 

definition  of  a  vector  norm.)  The  assumptions  of  Theorem  3-2.1  are 

stated  somewhat  differently  than  those  of  Whitby's  Lemma  3.1  in  order 

to  facilitate  the  different  method  of  proof  used  here. 

Theorem  3.3.1  contains  powerful  asymptotic  results  and  has 

intrinsic  interest  because  of  its  wide  applicability.  Theorem  3.3.1 

concerns  a  sequence  of  estimators  of  a  sequence  of  constants  (or 

constant  vectors).  The  sequence  of  estimators  becomes  close  (in  a 

well  defined  sense)  to  the  sequence  of  constants  with  high  probability; 

furthermore,  the  sequence  of  estimates  is  asymptotically  normal.  As 

seen  in  Chapter  this  sequence  of  estimates  can  easily  be  translated 

into  a  sequence  of  consistent  and  asymptotically  normal  estimates  of 

the  parameters  in  the  analysis  of  variance  problem.  The  basic  setup 

and  method  of  proof  of  Theorem  3.3.1  are  as  follows:  For  each  n  (of  a 

sequence  of  values  of  n  increasing  to  infinity)  there  is  a  vector 

function  a  vector  of  parameters  and  a  random  vector 

Y  .  (G  will  be  the  likelihood  equations.)  Then  for  each  Y  in  a 
~n  ~-n 

certain  set  of  Y  values,  it  is  shown  that  G  (jt  ,Y  ),  as  a  function 

'Ml  Ml  Ml  'Ml 

of  satisfies  the  conditions  of  Theorem  3.2.1;  thus  there  is  a  root 
^(Y^)  of  =  £  near  (jfQn  will  be  analogous  to  the  "true" 

parameter  point.)  If  n  is  large  enough,  the  set  of  Y  values  will  have 


large  probability.  It  is  also  shown  that  jr^Y^)  -  converges  in 
distribution  to  a  multivariate  normal  distribution. 

The  wide  applicability  of  Theorem  3.3.1  results  from  the  fact  that 
a  sequence  of  estimates  of  a  sequence  of  parameters  is 

considered  instead  of  a  sequence  of  estimates  {9^(1  )}  a  single 
parameter  0^.  The  estimates  of  the  single  parameter  9^  will  require 
normalizing  sequences  and  any  proofs  of  asymptotic  properties  must  take 
explicit  account  of  these  normalizing  sequences.  In  Theorem  3.3.1  the 
normalizing  factors  are  built  into  the  estimates  and  parameters  and 
are  not  explicitly  mentioned,  (in  Chapter  h  it  is  shown  how  is 
obtained  from  j0  by  multiplying  each  element  of  9  by  the  appropriate 
normalizing  factor.)  The  fact  that  the  normalizing  sequences  are  not 
specifically  mentioned  in  Theorem  3.3.1  allows  it  to  be  used  in  the 
analysis  of  variance  (where  the  estimate  of  each  parameter  may  require 
a  different  normalizing  sequence) ,  in  a  case  where  the  estimate  of 
each  parameter  can  be  properly  normalized  by  the  square  root  of  the 
number  of  observations,  or  in  almost  any  other  case  of  maximum  likeli¬ 
hood  estimation.  For  instance,  Theorem  3.3.1  can  easily  be  applied  to 
yield  consistency  and  asymptotic  normality  in  the  case  of  independent, 
identically  distributed  observations  given  in  most  textbooks  (e.g. 
Cramer  [1946]). 

The  precise  statements  and  proofs  of  Theorems  3.2hl_.and  3*3.1 
will  be  given  in  the  following  two  sections. 
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a)  (x^)  =  is  nonsingular, 

b)  Ub'1!!  <  2, 


c)  For  all  x  e  S  (x^),  [|jf(x)-Bj|  <  l/(2[|B"1||) . 

Then  there  exists  x,  a  solution  of  G(x)  =  0  such  that  x  e  S„m(x,0. 
-  ~  ji|  /s-0 

(Note  that  Condition  3.2.1. v  is  equivalent  to  stating  that  for 


a11  5  e 


SG.(x) 


3x . 


exists  and  is  continuous,  i, j=l,2, . . . ,p. ) 


x 


PROOF. 

First  note  that  3.2.1.V  implies  that  _f(x)  is  continuously 
differentiable  in  S^Cxq)-  Second,  since  ||x^-  x0||=||A-1a||  <  T]~by 
3.2.1.iii,  S^yjCx^)  e  S^(x^) .  Now  proceed  as  in  Kudin1  s  proof.  Let 


X  = 


4||B-1|| 


where  B  is  as  above. 


Then  -  ill  <  for  311  * e  s3'n(~i)  3.2.1.vi.c.  Now 

suppose  x  e  S„_(xn)  and  h  is  such  that  x  +  h  e  S_^(x, ).  Let 
~  37]  ~L  ~  ~  ~  37|  ~j_ 

F(t)  s  f(x  +  th)  -  tBh,  0  <  t  ^  1.  (i.e.  F:  [0,1]  -*  Rp. )  Since 


any  norm  is  a  convex  function  on  RP  ,  x  +  th  e  S^(x^)  for  t  e  [0,1], 


Thus 


ifc'wii  -  iiJffe +  %>5,  -  sail 


by  definition  of  •  for  matrices, 


"iffe  +  %)  -  all  llaJi 


i 
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2X  |jh| 


by  3.2.1.vi.c, 


=  2X  ||B--LBh. 


£  £\  IIb'1!!  ||BhJ| 

-  i  wi 

by  definition  of  X.  Theorem  5.20  of  Rudin  (1964:99)  states:  If  F  is 
a  continuous  mapping  of  [a,b]  into  Rp  and  F  is  differentiable  in  (a,b) , 
then  there  exists  r  e  (a,b)  such  that  j[F(b)  -  F(a)||  ^  (b-a)  JjF^'fr) }|. 
Since  F  above  is  indeed  continuous  on  (0,1)  this  theorem  applies  and 

"  '  ll£fe)  “  £(*)  -  Bhjl  =  ||F(1)  -  F(0)|| 


Wl'WW 


I  HI  • 


The  triangle  inequality,  ||;£+zj|  ^  ll^J[  +  ||zj[,can  be  applied  to 
z  =  to  give  iyi  -  y  £  ||  |  =  |y~y|.  This  property  is  used  with 

w  =■  Bh,  y  =  f(x+h)  -  f(x)  to  give 

Hall  -  itefe+y-£(2)ii s  ii£(5.+y-£fe)-ali 


I  ||Bh||  . 

<-  rwnJ 


This  implies 
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ifek)  -  £(*)  II  s  i  Ifell 


(*) 


^  2X 


■whenever  x  and  x+h  belong  to  S-_(xn).  Hence  f(»)  is  1-1  on  S__(x  ). 

~  ~  ~  3 1]  ~1  ~  3  m  '"l 

Now  S^C^)  c  S^Cx^)  and  S^Q^)  c  S^Cx^)  (S  =  closure  of  S). 
To  prove  that  _f  [S^  (x^)  ]  contains  the  sphere  [jf  (x^)  ] ,  note  that 


if 


Zl  = 

=  +  £)  -  -  5o +  £ 


=  A_1r(x  ), 

then  !|Xlj[  <  ^  by  3.2.1.iv  and  2X7]  = 


2T| 


^  because  ]|jB_1||  <  2 


411b-1!! 

by  3.2.1.vi.b.  Thus  ||jjrJ|  <  2X71  and  S2\7]^i^  contains  0. 

Let  T  =  {x|  Ijx-xJI  =  271}.  ^ow  f ix  £  e  S^C^)  (i.e.  H^-^H  <  2X71) 

and  define  0(>c)  =  |!^.-_f (x)  ||.  It  must  be  shown  that  there  exists 
x*  e  S^(x^)  such  that  £ (x*)  =  (i.e.  such  that  0(x*)  =0.) 

First  note  that  if  x  e  T,  then  (*)  implies 

r+j 

4X71  =  2X|!x-xl1! 
by  definition  of  T, 

*  ltei+  -  £(*l>H 

by  (*)  with  h  =  x-x. , 

<%/  /s/  /y) 


B 
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*  !tete)-adl +  h-z±W 

by  triangle  inequality  [;jr  =  f^x^)], 

=  0(x)  +  0(xx) 

<  0(x)  +  2\T\ 

because  0(x-L)  =  K^-^H  <  2XT]  because  y  e  Thus 

0(x>1)  <  2X7]  <  0(x)  for  all  x  e  T. 

Noras  are  continuous  functions  of  the  components  and  f  is 
continuous  so  0 (x)  is  continuous.  Further,  Sg^(x^)  is  compact. 

Hence  there  exists  x*  e  S^^x^)  such  that  0  (x*)  <i  0(x)  for  all 

x  e  .  But  x*  cannot  belong  to  T  because  0(x_^)  <  0(x)  for  all 

x  e  T.  Let  w  =  y-f(x*).  Since  B  is  nonsingular,  let  h  =  B*w.  Then 
choose  t  e(O.l)  so  small  that  x*  +  th  e  SOT.(x, ).  Then 

fc-£fe*)-  £%ll  =  Ib^l 

=  ||(l-t)w|| 

=  (i-t)|U|. 


||f(x*  +  th)  -  f(x*)  - 


by  the  argument  above. 


=  |t||wj|. 
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Then 


+  =  I  \xr£&  +  *£)!! 


=  lfc-£,(^)-Bth  +  Bth  +  f(x*)  -  f(x*  +  til)  |! 

5  !&-£(£*) -BthJ]  +  ||Bth  +  f(x*)  -  f(x*  +  th)H 

^  (1-t)  jyi  +  it  jjwj| 

=  (i-it)  ||w]! 

=  (1-ft)  0  (x*) 

If  0(x*)  >  0,  then  0  <  t  <  1  implies  (l-^t)  <  1  -which  in  turn  implies 
0(x*  +  th)  <  0(x*),  -which  contradicts  the  minimal  property  of  #. 
Therefore  0 (x* )  =  (x^) |j  =  0,  ’which  means  £=f(x*).  This  argument 

can  he  used  for  each  £  e  yielding  xr*  e  such 


that 


j£=f(x*).  Xu  particular  it  can  he  used  for  0  e  yielding  x 

such  that  f (x)  =  0.  But  this  says  £ (x)  =  A  ^G(x)  =  0,  -which  implies 
G(x)  =  0;  furthermore  x  e  SOT1(x, )  c  S.-^x-J.  This  comnletes  the  proof 

rss  ^  C.  I  j  'XL  j} '  | 

of  Theorem  3.2.1.  j  j  j 

At  this  point  it  can  he  mentioned  that  not  only  does  the  above 
proof  imply  that  the  described  x  exists  hut  that  it  is  unique  in 
S^(xq).  Unfortunately  the  theorem  does  not  yield  complete  uniqueness, 
just  uniqueness  in  the  neighborhood.  In  the  subsequent  statistical 
applications  of  this  theorem,  the  limited  uniqueness  is  of  little  value. 
This  is  why  it  has  not  been  stated  as  a  conclusion  of  the  theorem. 
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3.3.  A  General  Asymptotic  Theorem 

The  theorem  in  this  section  is  a  general  theorem  on  asymptotic 

results.  It  concerns  a  sequence  of  estimates  of  a  sequence 

of  parameters  {£-^3  •  It  states  that  for  n  large  enough,  there  will 

he  a  root  jr^Y^)  of  =  £  near  jr^  with  large  probability;  it 

also  states  that  $  (Y  )  -  converges  in  distribution  to  a  multi- 

'■'-n  ^-un 

variate  normal  distribution.  In  the  application  of  this  theorem,  the 


estimates  and  parameters  and  are  obtained  from  the 

A 

estimates  and  parameters  in  the  usual  problem  9  ,  0,  and  9^  by  multi - 

^-Q.  ~  .'-O 

plication  of  each  component  of  J9^,  J9,  or  9^  by  the  appropriate  normal¬ 
izing  factor  (See  Section  4.3.).  Furthermore,  the  function  G  represents 
the  (normalized)  likelihood  equations;  therefore,  this  theorem  deals 
with  maximum  likelihood  estimates  which  are  solutions  of  the  likelihood 
equations . 

Theorem  3-3.1  is  proved  by  demonstrating  that  for  Y^)  as 

defined,  for  n  large  enough,  the  conditions  of  Theorem  3.2.1  are  true 
except  for  an  event  with  small  probability.  The  results  of  Theorem 
3.2.1  then  immediately  imply  the  results  of  Theorem  3.3.1. 


THEOREM  3*3.1.  For  a  sequence  of  values  of  n  approaching  infinity 


define  for  each  such  n:  Y  an  nxl  vector  valued  random  variable, 
a^Qf^)  a  pxl  vector  function  of  A^Y^)  a  pXp  symmetric  matrix 
function  of  Y^,  a  pxl  vector,  ^(j^Y^)  and  Rn(^Q,Y^)joxl  vector 
functions  of  and  Y^,  and  a  pXp  positive  definite  constant  mat 
Suppose  the  following  six  conditions  are  true. 


,9 


i)  For  each  b  >  0,  given  e  >  0  there  exists  n^O^e)  such 
that  for  all  n  >  n^ 

4  e  Sb(lon)5  *  1"e* 

ii)  a  (Y  )  ^  71  (O.J). 

~nv—n 

iii)  A  (Y  )  +  J  E*  0. 

~n>'-n  ~  ~ 


iv)  For  each  b  >  0,  given  e  >  0  and  6  >  0  there  exists 
n  (b,e,6)  such  that  for  n  >  n^ 

pf  CT?  JVin’^H  <  6}  1  1‘e- 
VW 

v)  For  each  b  >  0,  given  e  >  0,  there  exists  n^b^e) 
such  that  for  all  n  >  n^ 


P{the  elements  of 

~~n 

functions  of  jr^  in 


are  continuous 

>  1-e. 


vi)  s  l  + 

~n 

given  e  >  0  and  6  >  0,  there  exists  nQ(b,e,6)  such  that 
for  all  n>  n 

P{  sup  ||E  (*  ,Y  )||  <  6}  a  1-e. 

WV 


Then  it  follows  that  given  e  >0  there  exists  b=b(e)  such  that 
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Pfthere  exists  a  root  jr^Y^)  of 

sugi  ijiat  j^)  e  Sb(j[on)}  £  1-e. 
Furthermore,  for  this  root,  j^Q^)  “  /7^(0,  J-^) . 


PROOF. 


Let  e  >  0  be  given  and  let  e* 


Define  D  (Y  ) 
~-n  ~n 


J  +  A  (Y  ), 
~  ~n  wr 


Then 


A  (Y  )  =  -J  +  D  (Y  ) 

~n>'n  ~  '•-nwi' 


=  ~J[I  -  J-1  D  (Y  )]. 


If  ||Dn(Yn)[l  is  small  (^which  is  true  in  probability  by  3.3.1.iii)  then 
aT^Cy^)  exists  and  is  given  by 


A  X(Y  )  =  -[I  -  J-3!)  (Y  )]"1  J-1 


'‘Ml  '"'-II  ^  ''Ml  Ml 


D*(Y  )  s  A""1(Y  )  +  J_1 

Mi  wi'  ~ 


=  a-1(y  )[j  +  a  (y  )]j_1 

^n  wi  ~  ~n.'--n  ~ 


=  A_1( Y  )D  (Y  )J"1 


=  -[I  -  J_1D  (Y  )]"1  J"1  D  (Y  ) J-1. 

r*->  r>->  /nJI  Mtl  ^  Ml  ~-Jl  ^ 
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Then 


j|D*(Y  )J|  =  ||[I  -  J"1  D  (Y  )r1  J"1  B  (Y  )  J_1l 
‘  hi  hi  "  11  ~  ~  HiHa/J  ~  --hi  'Hi  ~  1 


£  Hex  -  £_1  d1(x1)]'1IH1£1II-I!b,(x,)IHIj'1II 


'HI  HI 


| Id  (y  )|| 


1  -  II J"1!)  (Y  )|| 
,l~  HI'hi'11 


if  Ifen^H  is  small)  since  IfeJ!  5  Ifell  *  I  fell  804  IKl  -  a)_1!I  s  ^qgjj 

"X- 

■when  ||a||  <  1.  Thus  ||D^(Y^)  ||  -will  be  small  "whenever  Ito  (L.HI  is  small 


and  hence  3.3.1.iii  implies  Ifen(^n)II“^“*  0  "which  in  turn  implies  that 

■j 

A  (Y  )— 2-  -  J  .  Therefore,  the  multivariate  version  of  Slutsky’s 

'HI  'HI  ~  ’ 


Theorem  and  3.3.1. iii  imply  that  A-1  (Y  )a  (Y  7>  (0,J_1). 

Now  choose  T|  so  that  when  J2  ~  71  (O^J,-1),  P{|jZj|  ^  T)}  <  §  e*.  Now 
||* ||  is  a  continuous  function  of  its  elements:  therefore  Z  — Z 

'  /-w 


implies  ||Z 


Thus  there  exists  n^  such  that  for  n  >  n^ 


PCIfeJI  *  T)}  <  rdlzJI  2: 11}  +  -f-  =  e* 


by  definition  of  convergence  in  distribution.  Letting  Z  =A~^"(Y  }a  (Y  ) 

°  -HI  'HI  HQ '-HI  -HI 

we  find  that  for  n  >  n. 


P{||A-1(Y  )a  (Y  )||  2  T]}  <  e* 

UIH1  'hi'hiW1 


This  proves  Condition  3.2.1. iii  except  for  an  event  of  at  most 


probability  e*. 
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Now  show  that  p{jA  (Y  )|  =  0}  <  e*  for  n  large.  A  (Y  )=-[J-D  (Y  )] 

and.  thus  |A^(Yn)j  =  (-l)^j  J-^(y^)  | .  But  -A^Y^)  is  a  symmetric  matrix 

and  therefore  its  determinant  can  be  shown  to  be  nonzero  by  proving 
that  it  is  positive  definite;  that  is,  it  must  be  shown  that 
X  .  fj-D  (Y  )}  >  0.  J  is  positive  definite;  therefore,  X  .  (J)  >  0. 

Furthermore,  it  is  true  that  max  jx.(C  )|  -*  0  if  and  only  if 


J— 1,2, . .  .  ,p 


J  ''-a.' 


[C  l  .  -»  0  for  i, 3=1,2,..., p,  if  and  only  if  ||c  ((  -»  0.  Therefore  by 

'"*•*1  JLj  '"-TL 


assumption  3.3.1. iii  there  exists  n^  a  such  that  for  n  >  n^ 

P{  max  I  X  .{D  (Y  )}  j  >|X  .  (J)}  <  e*, 

.  ,  ?  1  2  min  ’ 

J  y  *  •  •  ,P 

which  implies 


P{|a  (Y  )  I  =  0}  <  e* 
L  '~n 1  J 


(2) 


This  proves  Gondition  3.2.1. ii  except  for  an  event  of  at  most 
probability  s* 


Since  liD  (Y  )'j|  0  there  also  exists  n~  h  n„  such  that  for 

n  >  n3,  P{||^)!|  >  l|£'1||}  <  «*.  But  ||A'1(Y1)|Hl-Jt^(Yi)lhl|j)|+||^(2!1)ll 
which  implies  for  n  >  n0 


„-l| 


Now  apply  3.3.1. iv  with  b  =  4T),  s  =  e*  and  6  = 


81k' 


-1. 


Then  3.3.1.iv 


implies  that  there  exists  n^_  ^  n^  such  that  for  n  >  n^. 
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P{  sup  ||R  (i  ,Y  )||  >  — L__}  <  e" 

W-W  ""  ^  8ir 'll 


The  last  two  inequalities  together  imply 


VS4T)(W 


(3) 


This  proves  Condition  3.2.1.iv  except  for  an  event  of  at  most  probability 
2  e*. 

Wow  apply  3. 3.1.1  with  b  =  47],  e  =  e*  to  claim  that  there  exists 
Uj_  >  n^  such  that  for  n  >  n^ 

P{lt  is  not  true  that 

G  U  ,Y  )  =  a  (Y  )  +  A  (I  )(*  -  *_  )  +  R  (*  ,Y  )}  <-e*  (4) 
-Mr-Mr-Mr  -mi '-mi'  -mt-mi'  ^-mi  AOn'  -mt-mi’-mi' J  v  ' 

This  proves  Condition  3.2.1. i  except  for  an  event  of  at  most  probability 
e*. 

Assumption  3.3-l.v  with  b  =  4T]  and  e  =  e*  guarantees  that  there 
exists  v  >  n,.  such  that  for  n  >  n,., 

6  5  5 


PCJ^  a)  is  not  continuous  in  S4T|%n»  <  e* 


-mi 


(5) 


This  proves  Condition  3.2.1.V  except  for  an  event  of  at  most  probability 
e*. 

Now  Conditions  3.2.1.vi. a-3.2.1.vi.c  must  be  proved.  If 

f  U  )  =  A-1(Y  )G  (*  ,Y  ),  then  J_  (*  ,Y  )  =  A~X(Y  )J_  (*  ,Y  ). 

Mi  'Miwi  -mi  -bf  -mi’-mi  ~n  ■mi w}  Vtn’-Mi 


-ml 


-Ml 


(*  ,Y  )  =  -J  +  E  (*  jY  ) 

-*G  -MT-Ml  ~  -M1-ML5-M1 


-Ml 


But 
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whence 


J_  (j  ,T  )  =  A_1(Y  )<T  (*  ,Y  ) 


[-J”1  +  D*(Y  )][-J  ■+  E  (ijr  :Y  )] 
L  ~  /'■nV-rr  J  L  ~ 


=  I-D*(Y  )J-J_1E  (*  ,Y  )  +  D*(Y  )E  (*  ,Y  ) 


I  +  E  (*  ,Y  ) 
<—>  '--trr  ~-n' 


Clearly  111^0^^)11  will  be  small  -whenever  |^(Y1)!I  and  H^C^Y^II 

are  small,  so  that  Conditions  3.3.1.iii  and  3.3.1.vi  -with  b  =  41]  can 
be  used  to  show  that  there  exists  n^  s  n^  such  that  for  n  > 


P[  sup  mux  >45  <  e*" 

WW  J=1>2>  — »  P  ° 

Now  if  j^1r|  is  some  particular  point  in  S^_  (^n)  and  13^(1^)  s£f  (Am’Sn)’ 

the  above  probability  statement  implies  (by  the  same  method  used  to  show 
|  A^Y^)  j  7^  0  above)  that  for  n  >  n^ 


pt  ISn<2n>  I  =  °3  <  e*‘ 


This  proves  Condition  3.2.1.vi.a  except  for  an  event  of  at  most 
probability  e*. 

Using  the  same  reasoning  as  that  used  for  A^Qf^)  it  is  true 


that  when  l|^(j[n^£n)  II  Is  small 


B-1(Y  )  =  [I  +  E*  (*_  ,Y  )]_1 
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y  y 

=  I  +  E  ■(*  ,Y  ). 

~  '~n  n’~n' 

Thus 

s  iteli + 

sl+ ltC<ian*^ll  • 

_y  y  y 

But  Uln  CjLin^H  wil1  be  sma11  whenever  lfel(Aln»^1)!l  is  small  and 

3.3.1.iii  and.  3.3.1.vi  can  he  used  to  imply  this.  Thus  again  using 
3.3.1.iii  and  3.3-l.vi  with  b  =  4-Tl,  it  can  be  shown  that  there  exists 
ng  *  n?  such  that  for  n  >  ng  P{  l!^(£ln>*Q) II  >  |]  <  e*.  This,  of 

course,  implies  that 


Pt||S1(Vll  >|}<«* 


(7) 


This  proves  Condition  3.2.1.vi.b  except  for  an  event  of  at  most 
probability  e*. 

If  ^  is  any  point  in  S4  (jr^)  then  ||jf  ± 


<--n 


if  and  only  if  1^  (j^)  -  B^)ll  •  |j£1(Y^||  s  h 
~n 

Note  that 

Hif  %’V  '  Sa^H 

=  i|l  +  E*(jr  ,Y  )  -  [I  +  E*(*_  ,Y  )]j| 
M~  v*-rr'-n'  ~  11 


K 
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=  U&W  - 

S  I^An^ll  +  H&WVH  ‘ 

If  1^(4)  II  *  \  “4  COt^MI  £  \  for  all  ^  c  S^t^)  then 


||j,  (jr,Y  )  -  B  (Y  )||  •  |lB-1(Y  ) 
,,~x  wawi  Mi  V 11  'Un  wr 


+  C(W£,>II]  •  IK1^)II 


/I  lx  6  12  1 

*  (5  +  5>  -  5  *  25  <  *■ 


Again  using  3.3.1.iii  and  3.3.1. vi,  there  exists  n^  ^  ng  such  that  for 


n  >  nr 


Pf,  f*  J^<WI>5)<e*. 


It  follows  that 


pf  rv 


-  B  (Y  )  ||  > - - - } 

^  ^  2||B_1(Y  )|j 

"~n  wr" 


Pt.  _f®„  J^X-^II  >k  °r  >f) 


<:  2  e* 


hecause  n  >  s  ng.  This  proves  Condition  3.2.1.vi.c  except  for  an 


event  of  at  most  probability  2  e*. 
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It  has  now  been  shown  in  (l)-(8)  that  the  conditions  of  Theorem 
3.2.1  are  true  for  each  n  >  except  for  an  event  at  most 
e*+e*+2e*+e*+e*+e*+e*+2e*=10e*=e .  Thus  Theorem  3.2.1  applies  and 
there  exists  5  (Y  )  a  solution  of  G  (4  ,Y  )  =  0  such  that 
f^(Y^)eS^(_t0n)  with  probability  greater  than  1-e  for  n  >  n^. 

But  if  C-  (*  (Y  ),Y  )  =  0, 


The  first  term  converges  in  distribution  to  7?  (0,J  as  was  shown 

/vjp  /V 

above.  That  the  latter  term  converges  in  probability  to  0  is  seen  by 
the  following  remarks. 

?U|V1(Yi)Ei(in(Yn),Yi)||  >  8) 

S  PfK1(2n)R1C|n(Y1),JI1)l!  >  6  and  Y^S^)} 

+  *  S4Tl(+0n» 

£  Pf  sup  |U'1(Y )E  (*  ,Y  )||  >  6} 

ww 

+  *<£,(&>  *  S4H<ion>5- 

But  the  first  term  is  small!  for  n  large  as  shown  in  proving  (3).  The 
latter  term  is  small  for  n  large  by  the  entire  first  part  of  the  theorem. 
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The  resulting  inequality  proves  that  0-  Then 

the  multivariate  version  of  Slutsky's  Theorem  implies  that 
^(Y^)  -  2p(£’£  ^hen  Theorem  3.3.1  is  proved  -with  b  =  4f| 


and  n  =  n  as  above. 

u  y 


CHAPTER  4 


ASYMPTOTIC  THEORY  FOR  THE  ANALYSIS  OF  VARIANCE  MODEL 
4.1.  Introduction 

In  this  chapter  the  maximum  likelihood  estimates  are  proved  to 
he  consistent  and  asymptotically  normal  as  the  size  of  the  experimental 
design  increases.  This  is  done  in  Theorem  4.4.1,  the  main  result  of 
this  paper.  In  Section  4.2  the  assumptions  used  to  prove  the  asymp¬ 
totic  properties  are  discussed;  in  Section  4.3  the  setup  used  to  prove 
Theorem  4.4.1  hy  application  of  Theorem  3*3.1  is  explained;  Theorem 
4.4.1  is  stated  in  Section  4.4  and  proved  in  Section  4.5.  (The  details 
of  the  proof  are  given  in  Appendix  A.)  In  Section  4.6  it  is  _noted  that 
the  maximum  likelihood  estimates  are  asymptotically  efficient  in  the 
sense  of  attaining  the  Cramer-Rao  lower  hound  for  the  covariance  matrix. 

Asymptotic  theory  is  useful  as  a  practical  tool  when  the  experi¬ 
menter  has  confidence  that  his  experiment  is  "large  enough"  for  the 
good  asymptotic  properties  to  hold.  In  the  case  of  independent, 
identically  distributed  observations,  one  hopes  one  has  taken  enough 
observations;  in  the  case  of  the  analysis  of  variance,  one  hopes  the 
size  of  the  design  is  large  enough.  The  device  used  to  prove  the 
good  asymptotic  properties  in  the  analysis  of  variance  is  a  "conceptual 
sequence  of  experiments."  For  each  n  of  a  sequence  of  values  of  n 
increasing  to  infinity,  an  experimental  design  is  considered.  Each 
experiment  may  be  an  extension  of  previous  experiments  or  it  may  be 
an  entirely  different  design.  The  only  requirement  is  that  the 
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sequence  of  experiments  have  the  properties  required  in  Section  4.2. 

Thus  any  particular  experimental  design  encountered  in  practice  may 
be  thought  of  as  a  part  of  some  sequence  of  designs;  the  experimenter 
hopes  that  the  size  of  his  particular  design  is  ’'large  enough"  for 
the  good  asymptotic  properties  to  hold,  (in  the  independent,  identi¬ 
cally  distributed  case,  there  are  results  attempting  to  consider  how- 
large  is  "large  enough";  no  such  attempt  is  made  in  this  paper.  This 
may  be  a  fruitful  subject  for  further  research.)  The  assumptions  of 
Section  4.2  may  seem  to  be  restrictive  upon  first  examination;  however, 
they  rule  out  no  experimental  designs  of  practical  interest.  Thus 
Theorem  4.4.1  has  wide  applicability  in  the  analysis  of  variance. 

In  Section  4.3  the  reparameterization  required  for  the  application 
of  Theorem  3.3.1  is  discussed.  It  is  shown  that  for  each  n  the  parameter 
vector  jf^  is  obtained  from  £  by  multiplying  each  component  of  £  by  the 
appropriate  normalizing  factor.  The  derivatives  of  the  log- likelihood 
up  to  second  order,  which  are  needed  for  Theorem  4.4.1,  are  computed 
and  a  matrix  used  to  compute  the  asymptotic  covariance  matrix  is 
defined.  Theorem  4.4.1  is  stated  in  Section  4.4  and  is  proved  using 
a  sequence  of  lengthy  lemmae,  each  of  which  proves  that  one  or  more 
of  the  conditions  of  Theorem  3.3.1  is  true.  These  lemmae  constitute 
the  details  of  the  proof  and  may  be  omitted  without  loss  of  continuity 
to  the  reader;  they  are  presented  in  Section  A. 4. 
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4.2  Assumptions  for  Asymptotic  Theory 

The  assumptions  needed  to  carry  out  the  asymptotic  theory 
arguments  will  be  stated  and  then  briefly  explained  in  this  section. 
Under  consideration  will  be  a  "conceptual  sequence  of  experiments", 
each  following  the  basic  model  for  the  analysis  of  variance  described 
in  Section  1.3.  -An  experiment  in  this  sequence  may  be  an  extension  of 
previous  experiments  or  an  entirely  different  design.  However,  all 
such  sequences  must  have  the  properties  described  by  the  following 
assumptions. 


ASSUMPTION  4.2.1.  n  and  each  m . ,  i=l,2, . . .  ,p^,  tend  to  infinity; 
each  m.  can  be  considered  a  function  of  n. 

-  l  - 

ASSUMPTION  4,2.2.  Let  mQ  =  n;  then  for  each  i,  j=0,l, . . .  ,p^,  either 


lim 

n-«> 


m. 

lim  — *4 
m. 

n-<°  i 


exists. 


(if  p.  .=  0,  then  let  p..=  °=  for  notational  convenience.) 

5  Ji 

Now  without  loss  of  generality,  let  the  U.  be  labeled  so  that  for 

i  <  Os  p.  -  >  0;  i.e.  the  m.  are  in  decreasing  order  of  magnitude. 

10  i 

Generate  a  partition  of  the  integers  [0,1, . . . ,p^} ,  S^,S^, . . . ,Sc,  so 
that  for  indices  i  in  the  same  set  S  ,  the  associated  m.’s  have  the 
same  order  of  magnitude.  Such  a  partition  is  generated  as  follows: 
i)  iQS  0;  S0=  f0}; 

ii)  For  s=l,2,...  ,  it  is  true  that  i  e  S  .  Then  for 

s  s 


i=i  +1,  i  +2,...,  include  i  in  S  until  o 

O  ^  O  7  *  o  * 


s  -  s  '  '  s 

call  the  first  value  of  i  where  this  occurs  i 


i  j1 
s’ 


J , ;  then  i  .e  3 
s+1’  s+1  s+1 


iii)  Continue  as  in  Step  ii  until  p^  has  been  placed  in  a 

set.  Call  this  set  S  . 

c 

There  axe  then  e-KL  sets  in  the  partition,  S^S^,  . . .  ,Sc,  and 
S  =  {i  ,  ...,i  ,-l}  .(‘S'Jhere  i  p  +1  to  insure  S  is  correct). 


Define  sets  S  as  follows: 
s 


S  =  U  S  ,  s=l,2,...,c  , 

s  t=s  z 


Ci  =  * 


The  S  are  then  sets  of  indices  whose  associated  m.  have  the  same  or 
s  1 

smaller  orders  of  magnitude  when  compared  with  m.  . 

xs 


For- each  i=l,2,...,p  ,  i  e  S  for  some  s=l,2,...,c.  Define 

s 

sequences  (depending  on  n)  as  follows: 


v.  =  rank  [U.  :U.  :TJ  ] 

s  s  -^1 


ranh[IL 


•  TJ  •  TJ 
~a-l  ''■x-KL 


i-1,2, ... ,p^  , 


..  :U  3  , 


V.  =  n  -  rank[U_:  ...  :U  ]  . 
0  ~1  ^  J 


V  . 

ASSUMPTION  4.2.3.  Let  r^  =  lim  ,  i=0,l, . . .  ,p^;  then  each  of 


the  r^  exists  and  is  positive. 


m. 

x 


fj-Jj T  #w  .  I  t  < 


For  each  TL,  i=l,2, . . .  ,p^,  let  the  columns  of  U.  be  given  by 
U  =  rJ1) 

~i  L^L  5^2  5***5%n.  J  * 

1 

ASSUMPTION  4.2.4.  For  every  i  and  every  ,i  e  S_  j  ^  i,  ■where 

s  _ 

i  e  Ss,  there  exist  two  nonnegative  constants,  and  R^,  both  less 
than  or  equal  to  one,  such  that 


m3  ,  .2 

/  (■  VU'UT")  s  *2 

t’1  Si  Si 


for  all  but  R^iir  values  of  k  in  the  set  {l,2, . . .  ,uu}  .  Furthermore , 
R2  are  that 


Rl+  (1-VR2  ^  N(S  )  +  1  » 


-where  N(S_)  is  the  number  of  indices  in  the  set  S_. 

T  s  . .  .  1  “  . “  ’ . .  s 

ASSUMPTICH  4.2.5.  Let  9^  -  (.&&),  2*SE£  2o=  <-°00’%V  ’  ‘ '  •°0p1)  ' 

Pi 

be  the  true  parameter  point  -which  is  being  estimated  and  XI  _r,  a0j~3 


be  the  true  covariance  matrix.  Then  there  exists  a  sequence  v. 


prL 


(depending  on  n)  and  a  PqXPq  positive  definite  matrix  such  tha~ 


^-0 
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The  following  is  an  attempt  to  briefly  explain  these  assumptions. 

The  object  of  the  assumptions  is  to  rule  out  certain  sequences  of 
experiments  for  which  the  limiting  distributions  either  degenerate  or 
"blow  up"  (See  Section  1.1.).  For  example,  asymptotic  theory  requires 
an  expanding  sequence  of  experiments,  which  is  what  Assumption  4.2.1 
requires.  Assumption  4.2.2  requires  that  the  expansion  should  be 
orderly  —  sizes  of  various  parts  of  the  design  should  relate  to  each 
other  in  an  orderly  way.  There  is  no  need  to  consider  disorganized 
sequences  and  so  nothing  of  importance  is  lost  by  these  first  two 
assumptions. 

The  next  three  assumptions  require  that  the  sequence  not  be  a 

degenerate  one.  That  is,  the  matrix  J  defined  in  Section  4.3  must  not 

degenerate;  it  must  be  positive  definite.  These  assumptions  insure 

that  this  is  so.  The  referred  to  in  Assumption  4.2.3  is  the  dimension 

of  the  part  of  the  linear  space  spanned  by  the  columns  of  th  •which  is 

orthogonal  to  the  space  spanned  by  the  columns  of  the  other  U.  where 
*  . 

j  e  S  ,  j  f-  i  and  i  e  S  .  Thus  v.  is  the  dimension  of  the  part  of  U. 
s  s  i  e  ~i 

not  dependent  on  the  other  U..  Assumption  4.2.3  says  that  this  part 

remains  an  integral  part  of  U. ;  it  does  not  get  overwhelmed  by  the 

other  columns  of  IA .  It  could  be  said  that  this  assumption  requires 
"fctL 

that  the  i  effect  not  be  "asymptotically  confounded"  with  the  effects 
associated  with  the  other  U.  mentioned  above.  Such  "asymptotically 
confounded"  design  sequences  are  of  little  interest  and  nothing  is  lost 
if  they  are  ignored.  It  should  be  noted  that  this  assumption  implies 
that  and  el  are  of  the  same  order  of  magnitude  and  hence  -» 
i=0,l,...,n  by  Assumption  4.2.1. 


Assumption  4.2.4  is  somewhat  more  difficult  to  explain.  Its  use 
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occurs  naturally  in  Section  A. 4.1  and  it  is  explained  there  also. 

This  assumption  could  he  described  as  a  requirement  for  "almost 
orthogonality"  of  the  designs  (orthogonal  in  the  language  of  experi¬ 
mental  design).  If  in  fact  the  designs  are  orthogonal  in  this  sense, 

R^=  0  and  R^-*  o  as  n  The  seeming  restrictiveness  of  this  assump¬ 

tion  is  just  that;  it  seems  to  be  restrictive  but  it  rules  out  nothing 
of  .  any  real  interest.  Any  reasonable  crossed  or  nested  design  sequence, 
whether  balanced  or  not,  will  satisfy  this  assumption.  Further  light 
may  be  shed  on  this  subject  in  Section  6.5.  There  an  example  is  given 
of  a  design  sequence  for  which  Assumption  4.2.4  does  not  hold  even 
though  Assumption  4.2.3  does.  (For  the  design  sequence  in  Section 
6.5,  it  can  be  shown  that  J  is  not  positive  definite,  even  though 
Assumption  4.2.3  holds.  Thus  some  stronger  assumption  like  Assumption 
4.2.4  is  necessary  in  this  problem.)  Note  that  Assumption  4.2.4 
requires  that  the  columns  of  U.  not  be  "too"  dependent  on  the  columns 

*'*CL 

of  the  other  U.  just  as  Assumption  4.2.3  does,  but  that  "too"  dependent 
is  defined  in  a  slightly  stronger  sense  than  in  Ajssumption  4.2.3. 

Assumption  4.2.5  again  rules  out  certain  degenerate  design 
sequences.  These  sequences  are  such  that  the  fixed  effects  cannot  be 
estimated  properly.  Again  there  is  no  loss  in  not  considering  such 
design  sequences.  Thus  in  a.~n  cases,  the  assumptions  above  eliminate 
only  disorganized  or  degenerate  sequences  of  experiments  which  are  of 
no  real  interest  in  any  case. 

A 

The  sequences  v?,  i=0,l, . . . ,p^+l,  will  become  the  proper  normal- 

JL 

izing  sequences  for  the  estimates  of  the  parameters  and  a.  v?  is  the 

\ 
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! 

m  i 


sequence  used  for  cm,  i=0,l, . . . ,p^,  and  v2  is  the  sequence  for 


Note  that  only  one  normalizing  sequence  has  been  allowed  for  all  the 
elements  of  a.  This  usually  is  the  correct  thing  to  do;  however,  with 
a  nontrivial  amount  of  work  of  the  same  sort  used  in  the  remainder  of 
this  chapter,  the  theory  can  he  extended  to  allow  different  rates  for 
each  element  of  a. 


4.3  Final  Setup  for  Asymptotic  Theory 

In  this  section  the  transformation  from  0,  to  for  each  n  is 

defined.  The  normalizing  factors  used  will  be  n^  =  v2,  i=0,l, . . . ,p  +1, 

where  the  v.  are  defined  as  in  Section  4.2.  All  the  v.  and  hence  the 
1  x 

n^  are  considered  as  sequences  (depending  on  n)  increasing  to  infinity. 

The  reparameterization  used  will  be  0  -*  *  ,  with  ©'  =  (a'cy')  and 

i/  =  (3',t'),  where  0  =  n  ^  a  and  [t  ].  =  n.a.,  i=0,l, . . . ,p_ .  The 

'ti  — n  p^+1  ~  i  i  ’  *  *-^1 

"true"  parameter  0^  then  transforms  for  each  n  to  =  (P^,T^n) . 

The  log-likelihood1  then  becomes 


-  l0S  V  W 


|  log  2it 


p  3 

\  log  It  |-|(y-X  — — )  ,T-1(y-X  —  ■) 
2  2vo~n  '  ~n  vo  ~  n  ' 

P-,+1  P-,+1 


1  The  notation  is  abused  somewhat  by  the  use  of  an(l  later 

X(y,jr)  to  represent  the  log-likelihood  function  in  terms  of  ^because 
X  0)  is  also  used  to  represent  the  log-likelihood  function  in  terms 
of  J3  and  X  (g,  6)  ^  X when  J0  =  jr.  However,  it  is  clear  from 
context  which  function  X  is  being  referred  to. 
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1  Lt  ], 

■where  T  =  S  -  G 

~  n. 

i=0  i  i 


Now  choose  a  norm  for  a  p  =  p^+p^+l  dimensional  vector 

a_  =  (a^jOg, . . . , a  ) '  as  follows:  ||aj|  =  max  |  |  .  Note  that 

P  i=l,2,...,p  1 

A 

if  jr^  is  an  estimate  of  and  0^  is  the  corresponding  estimate  of  0 

obtained  by  applying  the  inverse  of  the  above  transformation,  then  if 
Hii-  $On^  <  b  Por  n  >  n0  ^en  "^is  implies  that  for  all  such  n 

l[In3i‘  CIon]J  =M[&i]i"  a0i^  <b’  311(1 


^£n3j“  ^0n3jl  ^p^l^^n^j"  <  b>  ^  1»2,,*,,I>0*  11113  of 

course  implies  ] [ct]^-  oQ1|  <  ,  i=0,l,. . .  ,p^  and 

~  i 


]  [c?  ]  .-  O’  .  |  <  — 
1  Wj  Oj  n 


-  ,  i=l,2, . . . ,prt;  that  is,  the  estimator  0  is 

P-L+l  n 


approaching  0^,  the  true  parameter,  with  each  component  converging  at 
perhaps  a  different  rate. 

It  is  now  necessary  to  write  out  the  derivatives  of  the  log  like¬ 
lihood  up  to  second  order  to  get  the  functions  used  in  Theorem  4.4.1. 

At  this  point  the  notation  of  dependence  on  n  will  be  suppressed.  £ 
and  jp  will  be 'used  without  the  subscript  n,  understanding  that  every¬ 
thing — x  and  Ch  included — depends  on  n.  All  logic  is  carried  out 
for  a  certain  value  of  £  which  may  be  required  to  belong  to  a  certain 
set  (which  set  will  have  large  probability) . 


The  appropriate  derivatives  are  given  below.  Matrix  and  vector 


derivatives  are  used  'where  appropriate  and  expressions  have  been 
algebraically  simplified  where  possible.  '  In  each  case  the  indices 
i  and  j  run  from  0  to  p^. 


~  -  — L_  x/T-1(v-X  . —  -) 

S3  n  ,  n  ~  ~  &  r~>  n  . 


Pn+1 


P-L+l 


a  PQXl  vector; 


SX 


0  P 

-  ■—  [-tr  T_1G.  +  (y-X  )  /T~1G.T~1(y-X  ) ] ; 

St  .  2n.  ~  ~i  ~  n  ~  ~  n  , ' J  ’ 

x  x  Px+1  P-l+1 


s2x 

alas7”' 

~  ~  n 


,  _  1 -  X'T-1X  , 

.C  r-»  /v 


V1 


a  PqxPq  matrix; 


K-—\a  = - -  X  /T~1G .  T-1(y-X  -  -  —  ■)  , 

aT.ag  n.n  ~  ~  <~i~  'x-'  ~  n  n ’ 

l  ~  l  p^+1  p^+1 

a  p_xl  vector; 

0 


Sjr.  -  'tr  f V' 1BiJ 

i  3  i  D 


2(y-X 


~  ~  n 


'px+l 


)'  T_1G  T_1G  T-I(x-X  — ^-)  ] 
J  P-L+l 
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If  under  some  conditions,  the  second  derivatives  can  be  shown  to 
be  continuous  in  a  neighborhood  of  (the  tine  parameter),  then  under 
those  conditions,  a  Taylor  series  can  be  employed,  yielding 


a!i 


d'l'i 


P 

S 

3=1 


3'1'i9'!'  j 


£“in 


“Inli 


•where  =  jajr^  +  M1  e(0,l).  This  can  be  written  as  a 

vector  equation  as  follows: 

■  S&J  +  tin-  lon>  +  2n<in’2n>’ 

where 


[C-  (*  ,Y  )]. 
L~-n  tn’'--*!  l 


3l(X,jr) 
3*i  ■ 


t=t 


n 


[a  (Y  )]. 
L~n  '^n  i 


3+i 


i=ion 


[A  (Y  )]. . 

L~n  '-jn/  Jxj 


and 


a 


t&0: 


»n 


P  f  a  x(jr,£) 

JLa 


3=1 


- 


=4£ 


dilndijK  . 


4  .  >tWj-  tfcnlj) 

~*0n 


(Observe  that  a  (Y  )  and  A  (Y  )  only  depend  on  Y  since  iK  is  considered 

v  ~n  ~n  ~nv~n  J  *  ~  ^On 


known. )  For  G  as  defined  above 
~n 
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^-n 


. 


3ija  3ijf . 


l=ln 


*% 

,  i,«j=l,2, , 


All  Hie  items  required  to  apply  Theorem  3.3.1  have  now  been  presented 
with  the  exception  of  the  matrix  J  which  will  be  defined  below.  Theorem 
3.3.1  will  yield  a  sequence  of  estimates  j^(Y^)  of  jr^  which  will  be 
translated  back  to  a  sequence  of  estimates  ©^(Y^)  of  0^.  The  sequence 
0^(1^)  will  be  a  sequence  of  consistent  and  asymptotically  normal  roots 
of  the  likelihood  equations. 

Let  a  pxp  matrix  J  be  defined  in  the  following  manner. 


[J] .  .  =  lim  -  S-,  ,  it- 
~  is  _  0  \  M  .■ot . 

0  n-**>  Yi 


tton 


)]  j  i>0-l>2,...,p. 


It  is  required  that  this  matrix  be  positive  definite.  That  the  matrix 
is  positive  definite  is  proved  in  Section  A. 4.1.  Some  comments  can  be 
made  appropriately  at  this  point,  however.  Recall  that 

£0  2 

^  ~  ??n(X  — — ■ —  ,Tq)  Hen  jr^,.  the  true  parameter  .  Further  recall 

a~\+l 


2  m  all  -subsequent  writing,  the  dependence  on  n  will  be  maintained 
for  the  vector  £  which  will  be  called  jr^,  jf^n  or  on  occasion; 

however  the  T  matrices  and  (3  vectors  corresponding  will  be  called 
only  T_,  T,  or  T„  and  |3_,  or  P0  even  though  they  are  correctly 

<'“U '  ~X  ~d.  rjj  ~L  '"■a 

T^n>  ^qn  etc.  There  is  still,  of  course,  a  dependence  on  n  even 
though  it  is  suppressed  in  the  notation. 
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that 


2rf  2 
~°  i=0 


1  ‘^G. 

n.  ~i 


=  E 
i=0 


1  n.CT.. 
l  Oi 


G. 


n.  ~i 

l 


=  e  a.,  g. 

i=0  01 -1 


-  So 


and 


n 


a. 


£o  pl+1 


n 


P-L+l 


n 


P-,+1 


^0 


for  all  Tallies  of  n.  Thus  £  ~  ,Eq)  -when  for  all  values  of 

n.  Then  it  follows  from  the  definitions  of  second  derivatives  given 
above  that 


*  (  5  X 
®0  V  53SF 


i'fcn 


n 


1  /  -i 

X'T^-TC 


V1 


n 


IT*  S'  S1  X 

P-L+l 


2o 


by  Assumption  4.2.5; 


p-j+l 


“  £  )  0? > 


because  <5„(y)  =  >h_;  and 


!o(^-  ,  ,  )  =  2^57 tr  i£z&\ 

1  j  i-ion  1  J 


+  fv  SJ>*  5-^-)  fM 

1  3  p.j+1  P1+l  s; 


~ —  tr  2”1G.Er1G. 
2n .  n .  ~0  ~i~-0 

i  3 


by  Lemma  B.l.  Thus  it  remains  to  study  the  properties  of  the  (p^+l)x(p-^+l) 


matrix  C_,  defined  by 

~L 


2  lim  tr  5q  ~i£o  Si*  i> .. ,Pi« 


It  can  easily  be  shown  that  the  lim  inf  and  lira  sup  exist  for  the  last 
expression  as  will  be  done  below.  One  of  the  assumptions  of  Theorem 
4.4.1  is  that  the  limit  in  fact  exists  —  that  the  lim  inf  equals  the 
lim  inf.  This  is  not  a  grave  assumption;  again  it  merely  eliminates 


disorganized  sequences  of  experiments.  The  lim  inf  and  lim  sup  can  be 
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shown  to  exist  if  it  can  be  proved  that  the  sequences  are  bounded. 
Observe  that 


tr  E"1G.Z“1G. 

■~~0  ~i~-0 


tr  2"1U.U./E"1tJ.U' 


by  definition  of  the  G. , 
J  '-a.* 


tr  U'E"1U.U./E":LU. 


=  tr  (U?E":iU.)(u(E"1tJ.) 

~i  ~)~-0  ~i 


£  0 


because  for  any  matrix  A,  tr  AA7  =  EE  a  £  0,  Furthermore 


«v  r*r\J 


13 


13 


i  i  _i  min(m  ,m.)  , 


by  Propositions  A.  3. 8  and  A.  3. 9  with  E^=  Eg  =  Eg1  , 


B 


CT0iCT0j 


by  Proposition  A.  3. 10, where  B  is  a  finite  constant  not  depending  on  n. 


If  B*  is  defined  as 


B*  =  max 


i,j=0,l,...,Pl  a0ia0j 


5 
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then  3*  is  finite  because  0^  is  in  the  interior  of  the  parameter  space 
and  thus  all  the  crn.  are  positive.  Therefore 


0  *  2n.n.  tr  5o  £iSo  Jtj  “  2  B*  5  i,j-0,l,.  • • ^  > 

i  a 


for  all  n.  Thus  the  sequences  are  bounded  and  either  the  limit  [CL  ] .  . 

~L  1 J 

exists  or  the  lim  inf  does  not  equal  the  lim  sup.  It  remains  to  show 
that  the  matrix  CL  is  positive  definite.  This  is  done  as  a  part  of  the 

/vi. 

proof  of  Theorem  4.4.1  which  will  be  stated  and  proved  in  the  next 
sections. 


4.4  Consistency  and  Asymptotic  Normality  of  Maximum  Likelihood  Estimates 
Theorem -4. 4.1,  the  basic  theorem  of  this  paper,  states  that  in  the 
model  of  Section -1.3  and  under  the  assumptions  in  Sections  1.3  and  4.2, 
there  is  a  root  of  the  likelihood  equations  which  is  consistent  and 
asymptotically  normal.  Consistent  estimates  of  the  various  parameters 
are -defined  to  be  estimates  converging  in  probability  to  the  true  para¬ 
meters.  However,  it  has  been  noted  that  the  estimates  of  different 
parameters  may  converge  at  different  rates.  Similarly,  the  estimates 
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are  asymptotically  normal,  but  only  when  normalized  by  the  correct  set 
of  normalizing  sequences,  which  may  be  different  for  different  para¬ 
meters. 

This  theorem  will  be  proved  by  using  the  setup  of  Section  4.3  to 
apply  Theorem  3.3.1  to  this  problem.  The  proof  is  given  separately  in 
Section  4.5.  The  details  of  the  proof  are  given  in  Appendix  A. 

THEOREM  4.4.1.  Consider  the  mixed  model  of  the  analysis  of  variance 
described  in  Section  1.3*  under  Assumptions  1.3. 1-1. 3. 6.  Consider  a 
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J  tie  a  pxp  matrix  defined  by  J  =  £ 


£o  £  - 
0  c. 


,  -where  C_  is  a  p.xp^  matrix 
- ^  -  0  0  - 


defined  hy  C^.=  lira.  — ^ —  X'Eq'Sc  and  is  a.  (p..+l)x(p  +1)  matrix 
n-*3  n  .  „  ~  ~ 


P-L+l 


MSgQZ  =  I  lim  ~~  tr  ,  i,j=0,l,...,p  . 

°  n-wo  i  j  d 


Under  these  conditions  it  follows  that  J  is  positive  definite.  Further  - 
more,  for  each  n  there  exists  an  estimator  0  (Y  )  =  [a/(Y  ),  a/(Y  )]/ 


of  0  "with  the  following  properties. 


i)  Given  e  >  0  there  exists  b=b(e)  such  that  0  <  h  <  and 

n  =n.(e)  such  that  for  all  n  >  n^ 

0  0  '  -  0 


r 

n-ae— 


6=0  (Y  ) 
- n  ~n 


—  0,  i=l, 2,..,, p  , 


[c?  (Y  )  ]  ff  .  <  — 

L~awr  j  Oj  n 


p,+l 

x 


,  0-1,2, . . . »Pq5 


°0i 


<  n.  »  i=0,l,...,p.J  *  1-6 


ii)  The  pxl  vector  whose  first  p^  components  are 

n  {<$  (Y  )  -  a_}  and  whose  (p.+i+l)^*1  component  is 
P,  T1  -HI  Ml  'O  -  0  - ^ - 


ni^2aQ^i"  CT0i}’  converges  in  distri- 

bution  to  a  77  (0,J  ~)  random  variable. 

-  - 

The  proof  of  Theorem  4.4.1  follows  in  Section  4.5. 


4.5.  Proof  of  Theorem  4.4.1 — Consistency  and  Asymptotic 
Formality  of  Maximum  Likelihood  Estimates. 

As  suggested  in  Section  4.3,  Theorem  4.4.1  will  be  proved  by 
applying  Theorem  3.3.1  to  the  reparameterized  problem  given  in  Section 
4.3.  Recall  that  for  each  n  the  transformation  used  is  £  7*  jf^,  where 

Qf  =  (a '.',0'),  =  (p'.t'),  3  =  n  .  a,  and  [t  ].=  n.o.,  i=0,l, . . .  ,p. . 
~  '  >  Xq  'mi’mi'  ’  mi  p^+1  L~aJi  11’  ?  ’  5~1 

Then  a  Taylor  Series  is  used  to  write  the  log- likelihood  as  follows: 


BX(jr,jt)  axCjojr)  P  a  X(x>jr) 

St.  +  .  ,  j  ^On  j 

A-in  «0n  '  JMa 


7T 

where  jT  =  +  '(l-p,)^  and  (j,  e(0,l).  As  in  Section  4.3,  this  can  be 

written  as  a  vector  equation  in  proper  form  for  the  application  of  ' 
Theorem  3.3.1. 


G  (*  ,Y  )  =  a  (Y  )  +  A  (Y  )  (*  -  )  +  R  (*  ,Y  ) 

mi  mi’mi  '•mi  M3n  MrMrMr 


6o 


where 


SX  (y,jO 

[G  (it  ,Y  )].  =  — ; 


SX(y,£) 

[a  (Y  )].'  =  — st~~~ 
*-n wi  1  a^ 


'i=ion 


[A  (Y  ) ] .  . 
■miMi  13 


3  *(&&) 

^  i=ion 


p  ,  ^(l,j)  S2X(jt,£)  v 

=  .s-  \  a^r.at -■  Silf.ai^.  )^in^“^ion- 

j=1  Vl  i-in  Vl  ^  i“i On 


a  (Y  )  and  A  (Y  )  depend  only  on  y  because  it.  is  derived  from  0^  -which 

~n~n  ~nv'MT  *  J  &  ZOn  ~0 

is  known. 

Theorem  3.3.1  states  that  under  certain  conditions  (Conditions 
3.3.1. i-3. 3«1. vi)  the  following  statements  are  true. 

Given  e  >  0,  there  exists  b=b(e)  such  that  0  <  b  <  *>  and  nQ=  nQ(e) 
such  that  for  each  n  >  n^ 

P-jthere  exists  a  root  jr^Y^)  of  =  0  such  that 

UV  6  VW}  6  ^  * 

furthermore , 


iSV  -  ion  d-  VM"1) 


6l 
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3X(jr,£) 


1  axfe£) 


i=in(Sn) 


le=e  (y  ) 

^  *^-51  Wi 


,  1=0,1,., 


Thus  G^Cjr^CY^) ,  Y^)  =  0  is  equivalent  to 


3X  (£,£) 


A  ~  0,  i-l,2,...,p. 

30  i  >0=6  (Y  ) 


i'-urthermore ,  the  statement  that  jr^Y^)  e  )  Is  equivalent  to  the 

statement  that  |  [J^Y^  [80n]..|  <  b,  j=l,2,...  ,pQ,  and 

.  A  - 

^Ion]i>  <bj  1=0,1,. „,Vl,  which  is  in  turn  equivalent  to 

a0jl  <  *  J=1’2’"-’5o  md  °0i'  <  it  ’ 


V1 


i=0,l, . . .  ,p^.  In  addition,  the  vector  jr^Y^)  ~XQn~  ( 


B  (Y  )-8 
£nv-in;  <?On 

T  (  Y  )  — T 


np1+l^^Sn^'^0^5  'Ibn^i  ni^Sn^^i"2oi^ » 

i=0,l, . . .  ,p^.  Thus  the  conclusions  of  Theorem  3. 3-1  imply  the  con¬ 
clusions  of  Theorem  4.4.1  for  0  (Y  )  as  defined  above.  It  then  remains 
to  show  that  all  the  conditions  of  Theorem  3.3.1  are  satisfied  under 
the  assumptions  for  the  problem  as  reparameterized. 

The  first  thing  to  be  shown  is  that  the  matrix  J  defined  in  Theorem 
4.4.1  is  positive  definite  so  that  it  can  be  used  in  Theorem  3.3.1. 

(The  existence  of  the  limits  is  guaranteed  by  the  assumptions  of  Theorem 
4.4.1.)  Recall  that  a  matrix  J  was  defined  in  Section  4.3  by 


[J] .  . 
~  il 


r  / 

^  L-<S0  \  diMi|r-  ,  , 

n  1  3  i=4on 


)  J  ?  1>i_1525' 
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This  matrix  is  easily  seen  to  be  the  same  J  defined  in  Theorem  4.4.1. 
This  matrix  is  shown  to  be  positive  definite  in  Lemma  A. 4.1  in  Section 
A. 4.1.  Now  it  must  be  proved  that  the  assumptions  of  Theorem  4.4.1 
imply  the  six  conditions  of  Theorem  3.3.1.  This  is  done  in  a  series 
of  f our  lemmae .  First  each  condition  is  reduced  to  a  simpler  statement 
and  then  these  statements  are  proved  separately,  each  in  a  separate 
subsection.  The  conditions  are  not  considered  in  the  order  i-vi  but 
are  considered  in  approximate  order  of  increasing  difficulty. 

Condition  3.3.1.  ii  requires  that  a^Y^)  4hat  is. 


34 


71  (0,J). 


This  is  proved  directly  in  Lemma_A.4.2  in 


Section  A. 4. 2.  Condition  3.3.1.iii  requires  that  if  D  (Y  )  =  J+A  (Y  ), 

then  D  (Y  )  0.  It  is  clearly  sufficient  that  the  convergence  in 

'n,o.  ~ 

probability  occurs  for  each  element  of  D^Y^) ;  that  is,  it  is 
sufficient  to  show  that  [D  (Y  )].  .  0,  i, j=l,2, . . . ,p.  But 


[D  (Y  )]• • 

'vn  -vn  13 


~  ij 


^ Z’i ) 


r  , 

+  lxm  b  ^ 

t=f0n  ^  1  J 


M 


On 


t=^0n 


^ 0  \  . 


t&On 
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+  0\  3i|r  ,di|f . 


/ 


-lim 

n-*» 


rt 

so\  j 


Thus  [D  (Y  )  ] .  .  breaks  Into  two  terms.  The  second  obviously  converges 
mi  ,Mi  ij 

to  zero  by  the  definition  of  limits  of  real  numbers.  Thus  it  Is 
sufficient  to  prove  that  the  first  term  converges  to  zero  in  probability 
for  each  i  and  j.  This  is  proved  directly  in  Lemma  A.  4.3  in  Section 

A. 4. 3. 

These  first  two  conditions  are  proved  directly;  the  remaining  four 

will  be  proved  indirectly  in  the  following  manner.  Suppose  there  is  a 

sequence  of  events,  say  {Xn} ;  it  is  of  interest  whether  PfX^}  -  1  as 

n  -*  co.  This  can  be  shown  by  showing  that  there  exists  another  sequence 

of  events  {Z  }  such  that  Z  =>  X  for  all  n  and  that  P{Z  }  -»  1  as  n  -» 

L  nJ  n  n  nJ 

The  object  is  to  prove  the  probabilistic  statement  PfX^}  -*  1.  This  is 
done  in  two  steps — one  probabalistic  and  one  not.  Proving  PfZ^}  ->  1 
of  course  involves  probability  concepts;  however,  Z^  =>  X^  is  not  a 
probabalistic  statement  and  its  proof  is  usually  algebraic. 

The  first  condition  to  be  treated  in  the  above  manner  is  Condition 
3.3-l.vi.  This  condition  requires  that  for  each  b  >  0,  given  e  >  0 
and  6  >  0,  there  exists  n^ (b , e ,  8 )  such  that  for  all  n  >  n^ 


Pi 


sup 


t  s  Sl 

-Ml  1 


MoJ 


<b}Z  X-€  > 


wh tere  ,Y^)  =  3  +  Y^) .  Again  it  is  sufficient  to  prove  the 


bound  for  each  element  of  E  ( *  ,Y  ).  Now 

•Ml  -Ml  Ml 
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Thus  breaks  into  three  terms.  The  second  goes  to  zero  in 

probability,  as  was  shown  above,  and-  the  third  goes  to  zero  by  defini¬ 
tion  of  limit,  as  shown  above.  Thus  it  is  sufficient  to  bound  in 
probability  the  first  term.  This  is  done  by  using  Conditions  A.2.1 
and.  A.  3-1  which  were  shown  in  Sections  A.  2  and  A.  3  to  occur,  in  proba¬ 
bility.  It  is  shown  in  Lemma  A. 4. 4  in  Section  A. 4.4  that  for  b  as 
above,  if  Conditions  A.2.1  and  A. 3.1  are  true  then 

a2X(X,jr)  ^CfrA) 

|  V  SbW  Mn  1  J  Mon 

* 
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as 


n  -*  “5  Now  to  get  n0(b,e,6)  for  Condition  3.3.1.vi, 

nl 


first  choose  so  that  for  n  > 


pfi 

n<  at±at- 


Mon 


,  (  u 

5o\  at±at 


-  <U  I  )| 


a  'A“W 


<!• 


i  j  . . .  ,p^-  &  1  *  g  , 


(■which  can  he  done  by  Lemma  A. 4. 3).  Then  choose  n2  :>  n.  such  that  for 
n  >  n^ 


l«o( 


a^i3^ i 


„  /  a2xfe’i> 

/  lim  *o\  at. at- 
A=J[on  ^  1  J 


i=^0: 


)i<!  > 


n 


i, j=l,2, . . . ,p,  (-which  can  be  done  by  the  definition  of  limit).  Then 
choose  n^S:  n 2  such  that  for  n  >  n^, 

^Conditions  A. 2.1  and  A. 3.1  are  truej-  2>  1  -  ^  , 

(-which  can  be  done  by  Proposition  A.  2.1  and  the  definition  of  limit) . 
Finally,  choose  nQ(b,e,S)  s  n^  such  that  for  n  >  nQ 


St^*i  W 


i=ion 


l<§ , 


(which  can  be  done  by  Lemma  A. 4. 4).  Then  this  n^  is  the  desired  object 
of  Condition  3.3.1.vi,  This  full  line  of  reasoning  will  not  be  reproduced 
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for  the  next  three  conditions.  Analogous  arguments  of  this  type  are 
used  for  each  of  the  three. 

Condition  3.3.1.iv  requires  that  for  each  b  >  0,  given  e  >  0  and 
6  >  0,  there  exists  n0(b,e,6)  such  that  for  n  >  nQ 

H  sup  ||E  (*,Y  )||  <  e}  a  1-E  . 

ww  1 

As  above,  each  element  ]  may  be  considered  separately.  Now 


(x>i)  \ 

1  3  A=£on 


■where  =  |i^n  +  (l-p)  jr^  and  n  e(0,l).  But  for  ^  e  .  ^^On^ 
“  Clonal  <  b,  0=1,2, ...,p;  furthermore 

.  IIj£  -  ioJ  =  IICl-jOv  d-jOioJI 

=  (i-m-)!!^-  toJI 


^  (l-M-)b, 


so  that  jr^  e  ).  Therefore  it  is  sufficient  to  prove  that 


esu|fl  '  apap  ,  '  „  I  -0  1,3.1,2.... 

e  VW  1  a  t%  1  a  i“*0a 
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But  this  is  exactly  ■■what  is  proved  in  Lemma  A. 4. 4  using  Conditions 
A. 2.1  and  A. 3.1.  Thus  Condition  3.3.1.iv  is  proved. 

Condition  3.3. l.v  requires  that  for  any  h  >  0,  given  e  >  0,  there 
exists  nQ(b,e)  such  that  for  all  n  >  nQ 

P^the  elements  of  are  continuous 

mi 

functions  of  jr^  in  ^  s  1-e . 


By  the  same  logic  as  used  above  it  is  sufficient  to  prove  that  if 
Conditions  A- 2.1  and  A. 3.1  are  true,  then 


32\(£,£) 

[Jq  —  is  a  unformly  continuous  function  of  jr^  in 

mi  J  ’i 

Sb^ion^’  . . . ,p.  This  is  done  in  Lemma  A. 4. 5  in  Section  A.4.5. 

Condition  3.3.1.i  requires  that  for  each  b  >  0,  given  e  >  0  there 
exists  nQ(b,e)  such  that  for  all  n  >  nQ 


+  At^in-W  + 


for  all 


In 


e  S. 


b^iorp} 


s  1-e; 


that  is,  the  Taylor  Series  expansion  given  at  the  beginning  of  this 
section  must  be  valid  in  ^($Qn)  'with  large  probability.  For  the 
expansion  to  be  valid,  it  is  sufficient  that  all  second  derivatives 
are  continuous  functions  of  in  Again  it  is  sufficient  to 

show  that  if  Conditions  A. 2.1  and  A. 3.1  are  true  then 
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is  a  uniformly  continuous  function  of  in  s-b(jjQn)  > 


i, j=l,2, . . . ,p.  But  this  is  that  is  proved  in  Lemma  A .4.5.  Thus 
Condition  3.3.1. i  is  proved. 

As  ms  shown  above,  the  proofs  of  these  six  conditions  enable 
Theorem  3.3.1  to  be  applied,  which  is  used  to  prove  Theorem  4.4.1  in 
the  manner  demonstrated  in.  the  beginning  of  this  section.  The  five 
lemmae  used  to  prove  these  conditions,  plus  other  details,  are  given 
in  Appendix  A.  The  reader  may  easily  omit  these  details  without  loss 
of  continuity,  j  j  | 
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4.6.  A  Note  on  the  Asymptotic  Efficiency  of  the  Maximum  Likelihood 
Estimates 

It  has  been  shown  in  the  previous  sections  of  this  chapter  that 
the  maximum  likelihood  estimates  in  the  mixed  model  of  the  analysis 
of  variance  are  consistent  and  asymptotically  normal.  It  then  is  of 
interest  to  know  whether  these  estimates  are  asymptotically  efficient 
in  some  sense.  Efficiency  has  been  defined  in  several  different  ways 
by  various  authors.  She  maximum  likelihood  estimators  in  this  problem 

t 

are  efficient  in  the  sense  that  the  Cramer-Rao  lower  bound  for  the 

/ 

covariance  matrix  is  asymptotically  attained.  The  Cramer-Rao  lower 
bound  for  the  covariance  matrix  is  the  inverse  of  the  Fisher  information 
matrix,  which  is  the  expected  value  (under  the  true  parameter)  of  the 
Hessian  matrix  of  second  derivatives.  In  the  case  of  independent, 
identically  distributed  observations,  this  definition  needs  no  elabo¬ 
ration;  in  the  problem  under  study  here  more  explanation  is  required. 

The  matrix  considered  in  the  independent,  identically  distributed 
case  is  derived  as  follows.  The  likelihood  for  n  observations  is  just 
n  times  the  likelihood  for  one  observation;  therefore,  the  expected 
values  of  the  derivatives  for  n  observations  are  n  times  the  expected 
values  of  the  derivatives  for  one  observation.  The  entire  matrix  is 
then  normalized  by  l/n-  and  the  limit  taken.  This  is  the  rigorous 
definition  of  the  information  matrix  J.  That  is 


Ml .  .  =-lim  -  c?_ 
in  n  0 

u  rt-wo 


{ 


SrS o 


} 


69.se . 
i  o 
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Thus  although  a  limit  of  matrices  is  considered,  ‘what  is  arrived  at  is 

the  information  matrix  for  one  observation. 

■  In  the  problem  considered  here  there  is  no  one  observation  which 

does  not  depend  on  n.  Thus  limits  analogous  to  that  above  must  be 

considered.  However  each  parameter  may  require  its  own  normalizing 

sequence.  For  notational  convenience  let  n^  be  the  correct  sequence 

"bii 

for  0^;  that  is,  the  pxl  vector  whose  i  component  is  n^G^  has  a 
limiting  normal  distribution.  (Note  that  each  sequence  n^  depends  on 
n. )  This  does  not  agree  exactly  with  the  notation  of  previous  sections 
but  does  make  the  exposition  in  this  section  easier.  The  definition  of 
an  information  matrix  in  this  case  which  is  analogous  to  the  definition 
in  the  independent,  identically  distributed  case  is 


[J]  =-lim 
J  nr-w= 


l  96.99  . 
i  <] 


i ,  j  — 1 , 2 , . . . ,  p . 

A  sequence  of  estimates  is  then  said  to  be  asymptotically  efficient  if 
the  estimates  are  consistent  and  asymptotically  normal  and  if  the 
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asymptotic  covariance  matrix  is  the  inverse  of  the  information  matrix 
J  defined  above. 


The  estimates  in  the  analysis  of  variance  were  shown  to  be  consis¬ 


tent  and  asymptotically  normal  in  Theorem  4.4.1.  Furthermore,  the 
asymptotic  covariance  matrix  was  J  \  where  J  was  defined  in  Theorem 


4,4.1.  But  the  matrix  J  is  just  exactly  the  information  matrix 


described  above  and  hence  the  maximum  likelihood  estimates  in  the 


mixed  model  of  the  analysis  of  variance  are  asymptotically  efficient 

/ 

in  the  sense  of  attaining  the  Cramer -Rao  lower  bound  for  the  covariance 


matrix. 


CHAPTER  5 


COMPUTATION  OP  THE  MAXIMUM.  LIKELIHOOD  ESTIMATES 

5.1.  Introduction 

This  chapter  contains  discussions  of  computational  procedures  for 
the  calculation  of  the  maximum  likelihood  estimates  in  the  mixed  model 
of  the  analysis  of  variance  as  described  in  Section  1.3.  In  Section 
5.2  an  equivalent  form  of  the  likelihood  equations  is  derived.  A  ' 
simplification  which  occurs  when  each  CM  can  be  simultaneously  diagon¬ 
alized  and  also  a  case  where  explicit  solutions  to  the  likelihood 
equations  exist  are  described  in  Section  5.3.  Since  explicit  solutions 
seldom  exist,  an  iterative  procedure  is  proposed  in  Secticn_  5.4  to 
solve  the  highly  nonlinear  likelihood  equations.  The  iterative  pro¬ 
cedure  proposed  here  was  first  suggested  by  Anderson  (l971b),  (1973). 

J.  II.  K.  Rao  (l973)  has  pointed  out  in  a  personal  communication  to 
Anderson  that  this  iterative  procedure  is  in  effect  the  method  of 
scoring;  this  is  also  discussed  in  Section  5.4.  In  Section  5.5  it  is 
demonstrated  that  when  explicit  solutions  exist,  the  iterative  procedure 
of  Section  5.4  yields  those  solutions  in  one  iteration  from  any  starting 
point.  In  Section  5« 6  the  problem  of  avoiding  negative  estimates  is 
discussed.  * 

In  Sections  5.7  end  5.8  the  iterative  procedure  of  Section  5.4  is 
compared  with  a  procedure  suggested  by  Hartley  and  Rao  (1967).  Using 
four  sample  problems  given  in  Hartley  and  Vaughn  (1972),  it  was  found 
that  The  Iterative  Procedure  was  computationally  more  efficient  than 
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the  Hartley-Pac -Vaughn  algorithm,  (in  the  remainder  of  Chapter  5  the 
iterative  procedure  proposed  in  Section  5.4  will  be  referred  to  as 
"The  Iterative  Procedure"  and  this  will  be  abbreviated  TIP.  The 
algorithm  developed  by  Hartley,  Rao  and  Vaughn  will  be  referred  to  as 
the  E  art  ley  -Rao  -Vaughn  algorithm  and  this  will  be  abbreviated  as  the 
H-R-V  algorithm. )  A  Monte  Carlo  study  of  The  Iterative  Procedure 
revealed  that  it  was  indeed  a  computationally  efficient  algorithm. 
Although  it  has  not  been  proved  here  that  The  Iterative  Procedure 
always  converges,  the  Monte  Carlo  results  indicate  that  it  will  always 
converge  -unless  the  convergence  is  to  a  set  of  negative  estimates 
which,  make  E(a)  singular.  These  negative  estimates  occur  only 
infrequently  and  such  negative  estimates  are  not  of  interest  in  any 
case.  A  numerical  technique  which  was  used  as  a  modification  of 
The  Iterative  Procedure  made  it  even  more  computationally  efficient. 


5.2.  The  Likelihood  Equations  and  an  Equivalent  Form 

As  shown  in  Section  1.3  the  likelihood  equations  which  must  be 
solved  to  obtain  maximum  likelihood  estimates  of  a  and  a  are 


z 
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and 


P1  P1  P1 
tr(  S  cr,G,yiG.=  tr(  Z  a.G.Wf  Z  ct.G.V1C,  i=0,l, . .. ,p_  , 
Vj=0  Vj=0  0~0/  0=0  3~0/  55  ’1 


■where 


s  “  (x  -  s)(x  -  a) '  • 

These  may  he  rewritten  with 

P1 

ZM  -}lo 

as 

[X'S^Mxla  =  X'S"1^  ,  ~ 

tr  2-1(cr)G.  =  tr  E"1(ct)G.E"1(ct)C,  i«0,l,. ..  ,p, . 

(Note  that  2(ct)  has  an  inverse  because  ctq>  0,  s  0  i=l,2, . , .  ,p^, 

implies  2(a)  is  positive  definite  and  hence  nonsingular.)  An  equivalent 
form  of  the  second  set  of  equations  is  obtained  as  follows.  The  follow¬ 
ing  identity  is  true 

I  =  2_1(cr)2(cr) 

r**t 


K 


a.G . 

0~0 


=  2  a.  2_1(a)G.  . 

0=0  0  ~  ~  ~3 
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Substitute  this  identity  into  the  second  set  of  equations  yielding  for 
a  typical  left  hand  side 


tr  E_1(a)G.  =  tr  S-1(ct)G.  I 


1 

=  tr  2_1(CT)G.2_1(a-)  f  £  ct.G.1 


=  E  tr.  tr  E  (ct)G.E  (ct)G. 

,_q  ~  ~  ~i~  ~  *"-g 


=  [B(ct)ct].  } 


■where  a  =  (CTo5°i,***5°p  ) *  is  a  (p1+l)xl  vector  and  B(o)  is 


(p-j+l)x(p.j+l)  matrix  whose  i,o°'  element  is  given  by 


[B(ct) ] .  =  tr  E“1(ct)G.E"1(ct)G.  i,j=0,l,...,p  . 


Now  define  a  (pn+l)xl  vector  c(a,a)  (because  C  depends  on  a)  by 


Cc(Sj~)  tr  £  (s)£j£  (s)£  ’  •  •  •  jPq- 


Then  the  second  set  of  likelihood  equations  can  be  written  in  matrix 


xorrn  as 


B(ct)ct  =  c(ct.q')  . 

/*w> 
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This  form  is  equivalent  to  the  original  form  since  it  results  merely 
by  multiplying  by  an  identity  matrix.  (Z  ^(ct)Z(ct)  =  I  for  any  vector 
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Furthermore, 


lT\o)  =  PA_1(a)P/. 

rwv  /V 


Let  V  =  P/CP  and  rewrite  the  second  set  of  likelihood  equations  as 

''V  /s/v  * 

follows. 

For  the  left  hand  sides. 


tr  E  ±(<j)G.  =  tr.  PA-1(ct)p'PA.P' 


=  tr  A"1(o)P/PA.P,P 

i-"-/  rs^  /•»»/  /v/ 


=  tr  A_1(ct)A.  . 

^  r*J 


For  the  right  hand  sides. 


tr  E“1(cx)G.E~1(cr)C  =  tr  PA~1(c7)p'PA.P,PA"1(g)P,C, 

tv  /^/J_/v  <v  <v  /w  '/v*  rvs{I_<v  /w  Vv  /v  /v' 


=  tr  A"1(ct)A,A"1(ct)  V  . 

<v  Vv  /v^<v  (V  rv 

Let  the  diagonal- elements  of  A^  "be  X^^,  k=l,2,...,n.  Then  the  likeli¬ 


hood  equations  become 


n  X. 


(i) 


k=l  P1  r.\ 

E  CT  X^J 

3-0  J  k 


l  - x*0  ^ 

V_1  Pi 

3=0  J  K  J 


,  i— 0,1, ... ,p. 


th  .. 


■where  v. ,  .  is  the  k  diagonal  element  of  V.  Similarly,  the  equivalent 

KK  ^ 
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form,  of  the  likelihood  equations  becomes 


Another  simplification  occurs  if  a  can  be  estimated  independent  of 
a.  This  will  occur  if  X  is  of  the  form  X  =  QF  where  Q  is  nxp^  and 

>v>  rss  r*J  aw*  O 

consists  of  pQ  of  the  columns  of  P  and  F  is  pQXp0  2nd  nonsingular. 

Without  loss  of  generality  let  P  =  [Q:^]:  that  is  let  0,  be  the  first 

p„  columns  of  P.  Also  let 
0  ^ 


A  (cr)  = 

Aw*  f>-t  {__ 


r£(a) 

0  "1 

r  Q/  n 

r  1  “I 

be  partitioned  as  P. 

Then  ?'©=  ~ 

Q= 

^  0 

Aw* 

A(a)J 

Aw*  Aw* 

A/  AW*  j  A. 

Q,'  ~ 

A-* 

“  0  J 
A 

a  =  [X,S‘1(cr)X]‘;iX/r"1y 

a/  A/  a»s  a**  Jv 


=  [F/Q,PA'1(c7)P/QF3";LF'Q,PA“1(CT)P,y 

AI  AA»  A^  A/  AA/  A/  ATV  A/  Aw*  Xj 


I  Q,/ 

[Fr[I  0}A_1(a)r  ~  1f]‘1F,[I  0]A_1(a)F  ~  ]  y 

/>*  a*  a*  a/  Jaw*  a^  a  a  v/n/ J  ~ 

0  Q 


[F,^'3(ff)F]‘3T/A'1(!7)Q,y 

A->  Aw*  A  Aw*  A  Aw*  A  Aw*  A*/ 
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=  F“1A(a)F"tF'A';i(CT)Q,y 
=  F_1g/y 

•  /V  /VJ 


■which  is  independent  of  a. 

/-w 

One  further  simplification  occurs  -which  allows  a  closed  form 
solution  of  the  likelihood  equations.  Following  Anderson  (1969:58-59) > 
let  each 


^li  ~ 


U._.  1 

^2i  ~ 


A.  = 


u  .  I 
pP0i  ~ 


where  the  orders  of  the  identities  on  the  main  diagonal  are  n,  ,np,...,n 

X  et  Pg 

and  the  n^'s  do  not  depend  on  i.  The  object  is  to  take  account  of  the 
multiplicities  of  the  roots  of  S(a)  or  equivalently  the  diagonal  elements 
of  A(ct).  Define  for  k=l,2,...,Pg  hy 


V  =  —  E  v. 

1  1  j=l  33 


-  Vn2 

V  ^  jfn1+l 


-  X  V.  .  , 

np  j=n-n  +1 


-where  the  y..  are  defined  by  V  =  P7CP  =  P7(y-X  c?)(y-X  <$)'p.  The  likeli- 

J  J  rw  Xw  «v  ow  Xw  ow  r>w  r>w 

hood  equations  then  simplify  to 

[X'E-1(o)Xj  a  =  X'E_1(ct)y 

/n/  pw  >w  pw  pw  pw  'py  A*/ 


/  Va 

k=l  r  P1 

L*CT^kJ 


E2  °k  ^ki  \ 
k=l  P1 


i=0,l, . 


If  5  can  be  solved  for  independent  of  a  and  if  the  above  notation 

p/  p/ 

change  is  made,  then  c?,  V,  ,V0, . . .  ,V  are  a  sufficient  set  of  statistics 

~  1  ^  7  Pg 

for  the  problem.  Explicit  solutions  can  be  found  in  the  case  of 
Pg=  p.j+1.  Then  the  equations 


,f0  CT4  *"lkj  ~  Vk  k-ls2,***,P2 


have  Pg  equations  in  p^+1  =  Pg  unknowns  and  can  be  solved  for  cr.  (As 


Anderson  points  out,  the  matrix  of  coefficients  will  be  nonsingular 
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because  Ck,G  ,  ...,G  are  linearly  independent  [Assumption  1.3.5], 
which  implies  that  A-,, A,  , . . . ,A  are  linearly  independent.)  Substitut- 

j-  *^1 

ing  these  values  of  cr  into  the  denominators  of  the  likelihood  equations 
yields 

!g  \  vt 

k=l  vk  k=l 

which  shows  that  the  a  obtained  above  is  indeed  an  explicit  solution  of 
the  likelihood  equations. 

Explicit  solutions  are  interesting  and  helpful  when  they  exist  but 
unfortunately  they  often  do  not  exist.  Thus  an  iterative  procedure  is 
required  to  solve  the  very  nonlinear  likelihood  equations.  Such  a 
procedure  is  discussed  in  the  next  section. 


5.4.  The  Iterative  Procedure 

Writing  the  likelihood  equations  in  the  equivalent  matrix  form 
suggests  a  convenient,  simple  iterative  procedure  for  their  solution. 
It  is  basically  the  method  of  functional  iteration  which  is  used  often 
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in  numerical  analysis.  The  equations  to  solve  are 

[X,2"1(a)X]  a  =  X,S~1(cr)  y 

a/  r**  /s/  rv  /v 

and- 

B(ct)  o’  =  c(ct, a). 

Since  it  is  assumed  that  X  has  full  rank  and  that  S”^(cr)  is  positive 
definite.  X'X  '3'(a)X  is  nonsingular  and  the  equation  for  a  can  he 

/v  r+~t  /s/ 

solved  to  give 

»(ct)  =  tX/S'1(a)X]“3X,X“1(o)  y  . 

Then  there  is  only  one  equation  to  solve  for  ct,  namely 

B(ct)  ct  =  c[o-,df(a)]  . 

If  B(ct)  is  non singular  the  equation  may  he  restated  as 

o  =  B  1(ct)  c[CT,or(q)]  . 

If  a  (pn+l)xl  vector  a  -which  satisfies  the  last  ecuation  can  he  found, 

1  /s/  ~ 

then  o  and  a  =  0(5)  will  satisfy  the  likelihood  equations.  The  last 
equation  suggests  the  following  iterative  procedure.  (This  iterative 
procedure  will  he  referred  to  in  this  and . successive  sections  as  The 
Iterative  Procedure,  to  he  abbreviated  TIP.) 

i 

* 
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Let  o^  be  any  initial  guess  for  a.  (The  problem  of  choosing 
o^  is  discussed  later  in  this  section.)  Then  for  i=Q,l,2,... 

£(i+i)  =  S,-1(£(i))  ^£(1))^  • 

This  process  continues  until  £(i+1)  is  sufficiently  close  to  o^  in 
some  norm.  If  the  iterative  procedure  is  to  make  sense  it  must  be 
proved  that  bCo^)  )  is  nonsingular.  This  is  done  in  the  next  proposi¬ 
tion,  -which  is  stated  and  proved  as  Lemma  2.1  in  Anderson  (1971:11-12). 
It  is  restated  and  reproved  here  for  completeness. 


PROPOSITION  5*4.1.  If  ~(i)  is  suck  that  E(q^)  is  nonsingular  then 

B(c/.0  is  -positive  definite. 

~  x )  - — - - 

PROOF. 


Given  any  (e.+l)xl  vector  5  =  (6^,6.., 

~  X  U  1 

that  is  positive. 

But 


it  must  be  shown 


S'Bfcy.OS  = 

/-w  ~1_  J 


j=0  k=0 


[B(ct,.0]  ..  5  .6, 
~Mi)  Jh  o  k 


P1  P1 


E  E  tr  S'1(ct/.OG.S"1(ct/  ,)G,  6  .6.  , 

0=0  k=0  ~  ~(l]  ^  ~(D  0  k 


P1  P1 

tr  E~1(o/.'>)  F  E  8.G.V1(o/..0  f  E  5.G.1  , 
~  Mi).  t-^=0  o^J~  Mi)  Lk=0  k~kj 
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P1 

■ tr  [  £~1(2(i)>  (}l0  8 A)f  . 

-i,  W  ?1  '• 

which  is  positive  unless  £  '£(i)'\  ^  =  ^  imPossible 


-i  t  Pl  \ 

for  S  (cr/.\){  S  6. G.)  to  be  0  because  the  G. 
~  Hi)  ~ 

and  E  1(ct^n )  is  obviously  nonsingular.  J|| 


are  linearly  independent 


'  xl 

It  should  be  noted  that  so  long  as  the  0°  component  of  , 

is  positive  and  the  others  are  nonnegative  E(ct^)  will  always  be  non- 
singular.  Thus  so  long  as  the  iterative  process  avoids  negative  values 
it  will  continue  unimpeded.  (For  a  discussion  of  negative  values  for 
components  of  a,  see  Section  5-6.) 

J.  N.  K.  Rao  has  pointed  out  that  The  Iterative  Procedure  is  in 
effect  the  method  of  scoring.  That  this  is  so  can  be  seen  by  the  follow¬ 
ing  remarks.  The  method  of  scoring  obtains  iterations  in  the  following 

manner.  Given  an  estimate  oP  P*1  P^sme^si’  £>  calculate 

"til 

~(~(k))>  pXp  information  matrix  whose  I, j  element  is  given  by 


r  &(y,e) 

^(e(k)>]ij  =  -nw  0  0  I  >  i>^1>2—>P’ 

1  0  £=£(k) 


and 
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3^(y,9) 

/■<->  rsj 

__ 


e=e 


00 


th 

,  the  pXl  vector  whose  i  component  is  — — — — 

O0j. 


£-£(k) 


Then  the  next  iterate  9^k+-j^  is  given  hy 


,-1/ 


3X(jr,e) 


~(k+i)  ~(k)  +  ~  ^(k)-1  ae 


9=e 


*00 


It  is  easily  seen  that  ^(0/j^ )  for  the  case  under  study  here  is  given 
hy  (The  notation  used  here  is  same  as  that  used  previously  in  this 
section.) 


x'z'V.Ox  o 


2,'  fe(2(k)) 


Furthermore j 


S9 


e=9> 


h. 


£e 


~i* 


■where 
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tSA-  i[tr  -  tr  S-1C2<k)  )2i3 

*  il£C2(k)=2,(2(k))J  -  gfe<k)>  2(k)}i  >  w.! . ti  • 

ften  2(k+i)*  (^'k+i)*2(k+i)^  is  giiren  ^ 

2.(k+l)  -  «<2(k)>  +  -  2-fe(k)> 

and 

2(k+i)  *  Z(k)*S.'\i:)'>  =C2<k)>2.(2 <k)h  -  £(k) 

=  £"1(2(k))  2t2(k)*2.{2(k))]  • 

Thus  the  method  of  scoring  yields  the  same  equations  as  The  Iterative 
Procedure. 

The  Iterative  Procedure  described  above  has  the  advantage  of  being 
simple,  easy  to  describe,  and  easy  to  program.  (A  computer  program 
implementing  this  algorithm  is  described  in  Appendix  C.)  Other  methods 
for  solving  the  likelihood  equations  have  been  developed.  Hartley  and  J.N.K. 
Rao'|(l967)  suggested  solution  by  the  method  of  steepest  ascent.  They 
* 
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numerically  integrated  out  a  system  of  simultaneous  differential 
equations  to  obtain  solutions.  They  gave  a  proof  of  the  convergence 
of  their  algorithm.  Hartley  and  Vaughn  (1972)  presented  a  computer 
program  written  "by  Vaughn  (1970)  implementing  the  Hartley-Rao 
algorithm.  This  program  is  quite  complicated;  it  proceeds 
in  the  following  manner.  A  least  squares  approximation  is  obtained  to 
the  differential  equations  to  be  solved.  This  approximation  is  numer¬ 
ically  integrated  by  the  Runge-Kutta  method  until  a  solution  is  obtained. 
These  two  steps  are  repeated  until  convergence  is  obtained.  The  reason 
an  approximation  must  be  used  is  that  numerical  integration  methods  may 
require  large  numbers  of  iterations  and  a  large  amount  of  effort  is 
required  to  evaluate  the  likelihood  equations.  A  comparison  of  The 
Iterative-  Procedure  and  the  method  of  Hartley,  Rao,  and  Vaughn  on  some 
sample  problems  presented  in  Hartley  and  Vaughn  (1972)  is  given  in 
Section  5*7.  It  can  be  said  at  this  point  that  The  Iterative  Procedure 
seems  to  be  computationally  much  more  efficient,  at  least  in  most  cases. 

Choosing  the  initial  estimator  is  °f  some  importance  in  The 

Iterative  Procedure.  The  closer  is  to  the  true  solution,  the 

easier  it  will  be  for  the  algorithm  to  iterate  to  that  solution. 

Anderson  (1971b: 12-13)  considers  the  case  where  the  mean  of  is  either 
known  or  completely  unspecified  and  there  are  H  observations  on  He 
suggests  Z(o)  s°l'u^ion  °f  "ft16  equations 


E 

G=0 


tr 


AG.  AG. 


=  tr  AG. AC 


i— 0, 1, ... ,p^  , 
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where  A  is  an  arbitrary  positive  definite  matrix  and 


£  =  |  if  JJ.  is  known  or  C  ^(y^y)  #  if 


is  completely  unspecified.  In  both  cases  C  is  an  unbiased  estimate 
of  2  and  hence  cr^  will  be  an  unbiased  estimate  of  a.  One  choice 
suggested  for  A  is  I,  the  identity  matrix.  It  has  been  found  in  sample 
problems  that  the  above  method  does  not  work  very  well  in  practice. 

It  often  seems  to  take  many  iterations  to  reach  convergence  from  the 
above  while  other  types  of  guesses  work  better.  Anderson  mentions 

some  other  methods  for  the  case  above.  For  the  analysis  of  variance 
the  following  methods  may  work  well.  The  usual  analysis  of  variance 
estimates,  if  they  are  available,  or  approximations  to  them  can  be 
used.  A  rough  approximation  to  the  sum  of  squares  for  each  factor 
may  work  fairly  well  if  that  is  all  that  is  available.  Prior  knowledge 
may  be  put  into  the  initial  guess  if  desired.  If  no  information  at  all 
is  available,  £^qj  (l,0,.. •.,())  maybe  used;  this  corresponds  to  choosing 
A  =  I  in  the  Anderson  method  above.  The  choice  of  the  initial  guess  is 

/v  /v 

not  critical  but  some  attempt  should  be  made  to  make  reasonable  choices. 

The  advantages  and  use  of  this  method  have  been  alluded  to.  At 
this  point  an  attempt  must  be  made  to  answer  the  three  critical  questions 
one  must -ask  of  any  numerical  procedure. 

1)  Does  it  converge? 

2)  If  it  does  converge,  to  what  does  it  converge?  Is  the 
result,  a  root  of  the  likelihood  equations? 

3)  Is  the  answer  unique? 
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Unfortunately  complete  answers  are  not  known  to  any  of  these  questions. 
An  answer  to  all  three  is  given  in  the  next  section  for  a  special  case. 
To  study  the  first  two  in  general  Monte  Carlo  studies  were  used.  These 
results  are  described  in  Section  5.8.  More  complete  answers  to  these 
questions  may  be  found  after  further  research.  At  this  point  the 
excellent  Monte  Carlo  results  and  the  fact  that  The  Iterative  Procedure 
is  in  fact  the  method  of  scoring,  which  has  been  accepted  for  many 
years,  encourage  the  use  of  The  Iterative  Procedure  even  though  these 
questions  remain  unanswered. 


9.9.  A  Case  Where  the  Iterative  Procedure  Gives  Exact  Solutions  in 
One  Iteration 

In  the  event  that  the  conditions  described  in  Section  5.3  for  the 
existence  of  explicit  solutions  of  the  likelihood  equations  are  satis¬ 
fied,  The  Iterative  Procedure  works  very  well  indeed.  In  fact  it  will 
yield  the  exact  solutions  to  the  likelihood  equations  in  one  iteration 
starting  from  any  initial  guess.  Recall  that  these  conditions  are  the 
following. 

CONDITION  5.5.1.  There  exists  an  nxn  orthogonal  matrix  P  such  that 
P/G.P  =  A.  i=0,l, . . . ,p. ,  where 
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o 


with  the  orders  of  the  identities  n^n^, 
do  not  depend  on  i. 


,n_  respectively.  The  n,_'s 

p_  K 


CONDITION  5-5.2.  X  =  QF  -where  Q  consists  of  (-without  loss  of  generality) 
the  first  pQ  columns  of  P  and  F  is  a  nonsingular-  PQXP0  matrix. 

CONDITION  5.5.3.  P2=  Px+1. 

When  these  conditions  are  true  a  solution  to  the  livelihood  equations 
is  a  =  F  ^Q/y  and  a  the  solution  of 

^  r»j  /v 

pl 

*j0  ^kj  =  Vk’  k=1’2»*’*^p2  5 


■where  the  V^.  are  defined  from  the  matrix  V  =  p/(y~la)  as  P°H°WS 


I 


V. 


1 


n. 


E 

j=l 


v .  . 
33 


5 
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= 


_1 

n. 


Vn2 


E  v.  .  , 
2  j=n +1 


v  =-A-  s 
2  '2  i=1  >  " 


The  following  Proposition  is  true. 

PROPOSITION  5.5-1-  If  Conditions  5. 5.1-5. 5- 3  are  true,  and  is  any 
initial  guess  for  a  such  that  E(a^)  is  nonsingular,  then  The  Iterative 
Procedure  yields 


2(1)  -  2'1(2(o)’  2t2(0)4fe(o))3 


where  ^  is  the  solution  to  E  a.  p,  .  =  V,_,  k=l,2, . . .  ,p„. 

j  aj  k.  e 


PROOF. 


Let  cr^\...,cr^)/’.  Then  the  matrix  B(a/n>()  is  given 


(0)' 


[B(2(o)»i3  “  tr  S' 1(2(o)>2d£ 
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Plr 

AL. 


E2  JVjdgH 


3=°  ^A;1  to) 

(  2  oj  VkJ 


M-,- 


-L 

P2  “k  hriL^  ^kj  cj'} 

v»  *1  — v) 


E 

k=l 


.(£=0  ^  ^ 


=  2 


2  \  ^ki  Vk 


P 

E 

j2=0 


k=1  ?ff(°) 
* 


c  =  O2 
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■which,  is  the  right  hand  side  of  the  equation.  Thus  the  unique  solution 
cf-,\  is  just  a  the  solution  to 


-S0  ^kj  =  \  for  k=1>2>--*>P2- 


It  is  reassuring  to  know  that  if  there  exist  exact  solutions,  The 
Iterative  Procedure  will  pick  them  out  immediately  even  if  the  user  of 
the  method  is. ..unaware  that  they  exist.  The  Hartley-Rao -Vaughn  algorithm 
does  not  have  this  property.  However  it  does  have  the  property  that  it 
contains  "built  in  traps  to  guarantee  that  it  never  produces  negative 
estimates.  Of  course  the  maximum  likelihood  estimates  are  never  nega¬ 
tive.  The  Iterative  Procedure  as  presently  constructed  does  not  inher¬ 
ently  avoid  negative  estimates.  However,  it  can  be  adjusted  so  that 
the  final  answers  are  nonnegative.  This  is  discussed  in  the  next 


section. 


5.6.  Avoiding  Negative  Estimates 

The  general  problem  of  negative  estimates  of  variance  components 
has  been  discussed  by  many  authors.  (See  Searle  [1971:22]  for  a 
summary. )  Of  course,  maximum  likelihood  estimates  cannot  be  negative; 
negative  estimates  do  not  belong  in  the  parameter  space  and  hence  are 
ineligible  for  maximum  likelihood  estimation. 

The  Iterative  Procedure  defined  above  does  not  by  its  nature 
converge  to  nonnegative  estimates.  In  fact  it  may  very  well  converge 
to  negative  estimates.  However,  a  simple  modification  allows  one  to 
avoid  negative  estimates  as  final  answers.  The  general  solution  ’when 
any  estimates  are  computed  as  negative  is  to  fix  them  at  zero  and  solve 
the  remaining  likelihood  equations  subject  to  these  constraints;  that 
is,  if  c\  would  have  been  negative  for  i  e  S,  where  S  is  some  set  of 
indices,  the  new  equations  become 


/v 


.o 

OCF  . 
tJ 


0=0,.,..., Pn  o  S, 


cr.  =  0  i  e  S. 

l 


This  is  the  method  used  by  Hartley,  Rao,  and  Vaughn  in  their  algorithm. 
They  are  able  to  build  this  device  right  into  the  iterations.  This 
cannot  be  done  in  The  Iterative  Procedure,  (it  was  tried  but  did  not 
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work  out  "well;  it  induced  nonconvergence.)  However,  it  is  easy  to  see 
how  to  get  around  the  problem.  The  new  likelihood  equations  are  just 
the  equations  for  the  following  new  model. 


Xv  + 


£ 

j=0 


U.  b  . 


j$S 


b  .  =  e  and  the  other  U .  and  b .  are  as  defined  in 

this  model  has  exactly  the  same  form  as  the  old 
model  for  which  The  Iterative  Procedure  gave  negative  estimates;  it  is 
just  smaller.  Thus  when  The  Iterative  Procedure  yields  negative 
estimates 3  all  that  must  be  done  is  to  reformulate  the  reduced  problem 
above  and  resubmit  this  to  The  Iterative  Procedure.  One  continues  in 
this  manner  until  no  negative  estimates  are  obtained.  This  procedure 
avoids  entirely  any  negative  estimates  being  reported  as  maximum  like¬ 
lihood  estimates.  The  computer  program  presented  in  Appendix  C  does 
not  automatically  perform  the  above  operations,  but  with  a  simple  but 
major  overhauling  of  the  program,  this  could  be  accomplished.  Then  The 
Iterative  Procedure  would  be  comparable  to  the  Hartley-Rao-Vaughn 
algorithm. 

One  serious  problem  of  negative  estimates  in  The  Iterative 
Procedure  was  alluded  to  in  Section  5.1.  The  problem  is,  that  if  at 
any  stage,  a  negative  estimate  for  any  of  the  o^’s  occurs,  then  the 
matrix  E(o)  may  be  singular  or  nearly  singular.  When  this  happens  The 
Iterative  Procedure  will  become  very  unstable  and  may  even  blow  up.  In 


where  TL.  =  I  and 
~a 

Section  1.3.  But 
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fact,  this  actually  occurred  in  the  Monte  Carlo  studies  of  The  Iterative 
Procedure  and  was  the  only  cause  of  nonconvergence  of  The  Iterative 
Procedure  (See  Section  5.8.).  Unfortunately  no  technique  exists  at  the 
moment  for  eliminating  these  difficulties.  The  subject  is  undergoing 
further  study. 

Even  with  these  difficulties  The  Iterative  Procedure  still  performs 
very  well  in  comparison  with  its  competition,  the  Hartley-Rao-Vaughn. 
algorithm.  Some  comparisons  are  given  in  the  next  section. 


5.7.  Comparison  of  The  Iterative  Procedure  with  the  Hartley-Rao-Vaughn 
Algorithm 

Hartley  and  Vaughn  (1972:135-144)  give  four  examples  of  maximum 
likelihood  estimation  involving  real  data.  These  examples  will  not  be 
reproduced  here;  the  performance  of  the  Hartley-Rao-Vaughn  (to  be 

abbreviated  H-R-V  for  the  rest  of  this  chapter)  algorithm  as  stated  in 

* 

Hartley  and  Vaughn  and  The  Iterative  Procedure  (to  be  abbreviated  TIP) 

on  these  problems.  These  comparisons  will  point  out  the  computational 

efficiency  of  The  Iterative  Procedure.  The  H-R-V  algorithm  computes 

cr. 

variance  ratios  (that  is,  it  computes  oQ  and  =  —  for  i=l,2, . ..  ,p^) 

l 
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instead  of  variances.  The  results  of  TIP  have  been  converted  to  this 

form  for  comparison.  The  problems  were  fed  into  the  computer  program 

described  in  Appendix  C  using  the  same  initial  guesses  Hartley  and 

Vaughn  used  (converted  of  course  to  variances).  Efficiency  will  be 

* 

measured  by  comparing  the  number  of  inversions"'"  of  a  matrix  of  the 
form  Z(a)which  are  required  (each  method  must  perform  such  calculations) 
Both  methods  require  many  other  calculations,  with  the  H-R-V  requiring 
many  more,  but  these  inversions  are  the  major  computational  effort. 

The  Iterative  Procedure  requires  either  one  or  two  inversions  per 
iteration  (See  Section  5*8  and  Appendix  C.)while  the  H-R-V  algorithm 

(p-L+1)  (P-j+2) 

requires  — - - - +  1  inversions  for  each  iteration  to  make  its 


approximation  (Hartley  and  Vaughn  [1972:133]).  It  will  be  seen  that 
each  procedure  requires  approximately  the  same  number  of  iterations  to 
converge,  with  consequent  great  savings  for  The  Iterative  Procedure  in 
computational  effort.  The  results  of  the  four  sample  problems  given  by 
Hartley  and  Vaughn  were  as  follows. 

l)  The  Twofold  Nested  Model 

Yijk=  ^  +  ai  +  eijk>  i=l,2,...,I, 

j=lj2, . . . , J, 

k=l,2, . . . ,K, 


1.  Of  course  the  inversion  may  be  only  done  implicitly  as  in  solving 
a  number  of  linear  equations. 
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•where  p,  is  fixed  and  the  a’s,  Vs  and  e's  are  the  random  effects.  Such 

a  model  is  described  in  Section  6.3  with  the  a*s  as  fixed  effects.  For 

this  model,  there  exist  exact  solutions  and  so  as  in  Section  5.5,  The 

Iterative  Procedure  achieves  them  in  one  iteration,  (in  the  sample 

problem  of  Hartley  and  Vaughn ,  1=4,  J=3,  K=2,  for  n=24.)  However,  the 

computer  requires  one  more  iteration  to  recognize  convergence,  so  TIP 

requires  two  iterations  to  achieve  the  final  result,  which  agrees  with 

the  exact  solution.  A  total  of  two  inversions  of  the  matrix  £(cr)  are 

required.  The  H-R-V  algorithm  required  three  complete  cycles  and  hence 

~( 2+1) (2+2)  1 

required  a  total  of  3*  ^ - g - -  +  lj  =  21  inversions  to  obtain  answers 


which  agreed  with  the  exact  answers  to  only  two  decimal  places  for  the 
variance  ratios.  Of  course,  had  a  more  stringent  convergence  criterion 
been  applied,  H-R-V  would  have  attained  better  agreement  with  the  exact 
answers  at  a  cost  of  more  iterations. 

The  final  results  were 


Exact 
H-R-V 
■  TIP 


0.066542  39.106 
0.066549  39.095 
0.066542  39.106 


Y2 

24.204 

24.1999 

24.204 


2)  Twofold  Rested  Model  When  One  Variance  Ratio  is  Zero 
This  model  is  the  same  as  above  but  for  these  data  the  variance 
ratio  for  the  a  effect  is  calculated  as  negative  and  hence  set  to  zero. 
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(In  the  sample  problem  of  Hartley  and  Vaughn,  1=5,  J=2,  K=2,  for  n=20. ) 
The  Iterative  Procedure  requires  two  iterations  to  recognize  the 
negative  variance  and  two  more  to  compute  the  final  answer  in  the 
reduced  model  as  described  in  Section  5.6.  Thus  a  total  of  four 
inversions  were  required.  The  H-R-V  algorithm  required  four  complete 
cycles  and  hence  4*7=28  inversions  to  obtain  the  final  answers.  Both 
answers  agreed  with  the  exact  answers. 

The  final  results  were 


A 

°0 

A 

Y1 

A 

Y2 

Exact 

0.0387 

0.0 

0.35695 

H-R-V 

0.0387 

0.0 

0.35696 

TIP 

0.0387 

0.0 

0.35695 

3)  Unbalanced  One-Way  Classification 


yij"  +  ai  +  eij>  i-l,2,...,I, 

0=lj2, ... ,J^. 


This  model  is  discussed  in  Section  6.4.  The  a’s  and  e’s  are  random 
effects  and  u  is  a  fixed  effect.  Ho  exact  solutions  exist  here,  (in 
the  sample  problem  of  Hartley  and  Vaughn,  1=5  and  the  Jis  are  5>3,2,3,1j 
for  n=l4. )  The  Iterative  Procedure  requires  five  iterations  involving 
a  total  of  6  inversions  to  converge.  The  H-R-V  algorithm  required  five 
iterations  with  a  consequent  5*(  -^-+— +  1^  =  20  inversions.  The 


answers  agreed  to  three  decimal  places.  (Hartley  and  Vaughn  only  gave 
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the  answers  to  three  decimal  places  in  this  problem. ) 
The  final  results  were 

H-R-V 
TIP 

4)  Two-Way  Classification  with  Interaction 


0.773  0.371 

0.773  0.671 


i— 1,2 } , I, 
0=1,2 , . . . , J, 
k=l,2, . .  ,.,K, 


where  p,  is  a  fixed  effect  and  the  a’s,  b*s,  c’s  and  e’s  are  random 
effects.  This  model  is  discussed  in  Sections  6.1  and  6.2.  Wo  closed 
form  solutions  exist  for  this  model,  (in  the  sample  problem  of  Hartley 
and  Vaughn  1=2,  J=3}  K=3,  for  n=l8. )  The  Iterative  Procedure  required 
12  iterations  involving  a  total  of  13  inversions  to  reach  a  solution. 
The  H-R-V  algorithm  required  15  iterations  with  a  consequent 


inversions. 


The  answers  for  the  two  procedures 


only  agree  to  two  or  three  digits  but  as  far  as  is  possible  to  know  The 
Iterative  Procedure  gives  more  accurate  results.  (For  example  can  be 
solved  for  exactly  in  this  model;  see  Section  6.2.)  Again  the  E-R-V 
algorithm  could  be  made  to  converge  more  closely,  but  at  the  cost  of 
mor^  iterations. 
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The  final  results  were 


A 

CTo 

A 

Y1 

A 

Y2 

A 

Y3 

H-R-V 

69.75 

9.5^ 

12.82 

0.326 

TIP 

69.78 

9.52 

12.79 

0.327 

These  four  examples  point  out  that  The  Iterative  Procedure  seems 
to  be  computationally  much  more  efficient  than  the  H-R-V  algorithm.  To 
be  sure,  it  may  be  unfair  to  compare  in  cases  l)  and  2)  where  The  Itera¬ 
tive  Procedure  gives  exact  solutions  and  the  H-R-V  just  iterates.  Still, 
it  is  an  advantage  to  give  exact  solutions  when  they  exist.  In  the  cases 
where  exact  solutions  do  not  exist  the  comparison  is  just  as  dramatic. 

One  can  hardly  generalize  from  four  examples,  but  it  is  true  that  a 
general  pattern  develops  that  if  one  method  requires  many  iterations, 
so  will  the  other;  however,  the  H-R-V  algorithm  pays  a  much  higher  / 

price  per  iteration.  This  pattern  continues  in  the  Monte  Carlo  studies 
in  the  next  section.  The  H-R-V  method  has  an  advantage  that  it  has 
been  proven  to  converge.  Such  a  result  has  not  been  proved  for  The 
Iterative  Procedure  but  Monte  Carlo  results  have  been  most  encouraging. 
These  are  presented  in  the  next  section. 
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5.8.  Monte  Carlo  Results 

As  noted  in  previous  sections,  it  has  not  been  proved  here  that 
The  Iterative  Procedure  is  guaranteed  to  converge.  Thus  an  attempt 
has  been  made  to  demonstrate  its  effectiveness  by  Monte  Carlo  methods. 
The  results  have  been  most  encouraging.  The  following  procedure  was 
used.  Only  one  model,  the  two-way  crossed  balanced  design  described 
in  Sections  6.1  and  6.2,  was  used.  There  were  two  reasons  for  this. 
First,  the  iterative  equations  can  be  easily  programmed  and  do  not 
require  massive  amounts  of  computer  time  to  reach  a  solution  as  might 
be  required  with  mere  complicated  designs.  Second,  there  is  no  closed 
form  solution  to  these  equations  so  the  iterations  are  nontrivial. 
Problems  which  might  possibly  occur  can  occur  under  this  setup  as 
easily  as  under  any  other.  This  layout  seemed  to  offer  the  best 
possibility  for  finding  whatever  problems  The  Iterative  Process  might 
have  in  the  most  economical  way. 

The  actual  Monte  Carlo  process  was  carried  out  as  follows. 
Sufficient  statistics  exist  for  this  model  (see  Section  6.1)  and  can 
be  generated  with  pseudorandom  chi-square  generators  once  I,  J,  K  and 
a  set  of  "true"  parameters  CTq,  have  been  chosen.  These 

statistics  were  then  fed  into  The  Iterative  Process.  If  convergence 
occurred,  any  negative  estimates  were  set  to.  zero  and  the  equations 
resolved  with  those  estimates  restricted.  When  final  solutions  were 
obtained  they  were  recorded  and  fed  into  the  Hartley-Eao- Vaughn 
algorithm  to  confirm  that  they  were  indeed  solutions  of  the  likelihood 
equations.  If  300  iterations  occurred  without  convergence  the  process 
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stopped  and  the  sufficient  statistics  were  recorded  and  later  fed  into 
another  program  to  analyze  in  more  detail  -why  convergence  failed  to 
occur.  Over  15,000  separate  Monte  Carlo  repetitions  were  run  for 
various  I,  J,  K,  Cq,  o^,  ct^,  combinations.  (See  below  for  more  . 
detail  on  the  types  of  combinations  used. ) 

The  results  of  the  Monte  Carlo  analysis  support  the  contention 
that  The  Iterative  Procedure  is  indeed  an  effective,  efficient  method 
for  calculating  maximum  likelihood  estimates  in  the  mixed  model  of  the 
analysis  of  variance.  Two  problems  were  discovered  during  the  Monte 
Carlo  runs,  one  of  which  was  easily  rectified  and  the  other  of  which 
was  more  serious.  These  problems  will  be  discussed  next  followed  by 
the  large  quantity  of  favorable  results  of  the  Monte  Carlo  trials. 

When.the  Monte  Carlo  runs  were  started  it  was  found  that  sometimes 
it  seemed  like  The  Iterative  Process  was  not  converging  when  in  fact  it 
was.  (This  occurred  less  than  0.8$  of  the  time.)  The  problem  was  that 
the  convergence  was  so  slow  that  it  might  have  taken  up  to  10,000 
iterations  to  converge.  For  instance  the  sequence  11000,  9000,  10999, 
9001,  10998,  9002,...  is  converging  to  10000  but  will  require  2000 
iterations  to  get  there  at  the  rate  it  is  going.  The  Monte  Carlo  results 
seemed  to  indicate  that  whenever  the  slow  convergence  occurred,  it  was 
in  the  oscillating  manner  indicated  above.  A  natural  correction  for 
this  problem  seemed  to  be  to  average  consecutive  iterations.  This 
proved  to  be  an  excellent  idea  as  it  eliminated  entirely  the  problem  of 
reporting  lack  of  convergence  due  to  slow  convergence  and  improved  other 
slow  convergence  rates  by  a  factor  of  from  10  to  50  or  more.  However, 
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care  must  be  taken  in  applying  this  technique  since  if  applied  indis- 
criminantly,  it  can  lead  to  a  false  convergence  'where  a  sequence  of 
iterates  regenerate  themselves  by  averaging.  (This  actually  occurred 
in  early  studies. )  The  false  convergence  can  be  easily  eliminated  by 
simple  programming  steps  -which  are  described  in  Appendix  C.  All 
further  results  reported  here  are  results  after  the  above  modification 
was  made  to  The  Iterative  Process. 

A  more  serious  problem  -which  occurred  was  mentioned  briefly  in 
Section  5 .6.  When  negative  estimates  were  generated  in  one  iteration 
in  such  a  configuration  that  £(a)  became  singular  or  nearly  singular, 
the  process  became  very  unstable  often  oscillating  between  several 
points  of  singularity.  Once  such  unstable  points  were  reached  con¬ 
vergence  almost  never  occurred.  Wo  totally  satisfactory  solution  to 
this  problem  was  found.  However,  some'  facts  became  evident.  First, 
the  problem  can  only  occur  -when  some  of  the  estimates  are  negative; 
for  positive  estimates  convergence  occurred  100$  of  the  time.  Second, 
the  problem  tended  to  occur  only  in  situations  when  I,  J  and  K  were 
small  and  when  oQ,  c^,  Op,  and  cr^  "were  chosen  to  increase  the  probability 
of  negative  estimates  occurring.  (For  instance  if  the  interaction 
variance  is  large  with  respect  to  the  row  and  column  effect  variances 
Op  and  cr^  or  if  the  error  variance  o^  is  large  with  respect  to  cr-^  then 
the  probability  of  this  problem  occurring  seemed  to  increase.)  It  is 
somewhat  reassuring  to  note  that  srioh  configurations  occur  infrequently 
in  real  data. 

¥ 
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Although  the  shove  problem  is  serious,  it  may  possibly  be  overcome 
by  restarting  from  a  new  trial  solution.  If  this  fails  it  is  possible 
to  act  as  if  the  final  solution  had  been  reached  at  whatever  point  the 
blow  up  or  oscillation  occurred.  Any  estimates  which  are  negative  at 
this  point  are  set  to  zero  and  the  equations  resolved.  This  second 
method  is  not  totally  satisfactory  but  will  work  in  most  cases. 

The  results  of  the  Monte  Carlo  study  not  only  pointed  out  two 
problems — one  of  which  was  completely  rectifiable  the  other  only 
partially  so — but  also  produced  many  encouraging  results.  The  first 
positive  result  was  the  fact  that  in  over  15000  runs  with  various  I,  J, 

K,  crc,  ct^,  cr2,  <x  combinations,  (see  below  for  more  detail)  The 
Iterative  Procedure  converged  98.7 %  of  the  time  when  the  failures  due 
to  slow  convergence  are  included  and  99+%  of  the  time  when  the  averag¬ 
ing  correction  is  made.  In  every  case  where  convergence  occurred,  the 
Hart ley-Rao -Vaughn  algorithm  converged  in  one  iteration  starting  with 
the  final  results  of  TIP  as  an  initial  guess.  This  verified  that  in 
every  case  where  TIP  converged,  it  converged  to  a  solution  of  the  like¬ 
lihood  equations. 

Another  positive  result  of  the  Monte  Carlo  study  was  that  The 
Iterative  Procedure  seldom  required  a  large  number  of  iterations  to 
reach  convergence.  The  number  of  iterations  seldom  exceeded  20  although 
in  a  few  cases  it  was  over  200.  The  averaging  modification  mentioned 
above  further  reduced  the  number  of  iterations  required.  This  contrasts 
with  other  procedures  which  may  require  large  number  of  iterations.  For 
instance,  the  Hartley-Rao-Vaughn  algorithm  may  require  a  thousand  or  more 
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Runge-Kutta  iterations  for  each  approximate  solution  and  there  are  as 
many  of  these  approximate  solutions  as  there  are  overall  iterations. 

(It  was  for  this  reason  that  in  the  Monte  Carlo  study,  the  H-R-V 
algorithm  was  used  only  as  a  check  rather  than  being  allowed  to 
iterate . to  a  solution. ) 

It  was  also  noted  that  if  the  "true"  parameters  were  chosen  so 
that  negative  estimates  were  unlikely  to  develop  (see  above),  the 
percentage  of  runs  where  convergence  occurred  went  up  and  the  number 
of  iterations  went  down.  As  long  as  no  negative  estimates  occurred. 

The  Iterative  Procedure  converged  every  time.  It  seems  that  such  a 
configuration  of  parameters  is  likely  to  occur  in  real  data. 

Another  very  important  aspect  of  The  Iterative  Procedure  which 
was  highlighted  by  the  Monte  Carlo  study  is  that  as  the  size  of  the 
design  increased,  so  did  the  computational  efficiency  of  TIP.  The 
percentage  of  runs  There  convergence  occurred  increased  (reaching  100$ 
for  large  designs).  The  average  number  of  iterations  per  run  decreased 
(getting  very  close  to  the  absolute  minimum  of  2  for  large  designs  and 
attaining  this  minimum  for  very  large  designs). 

The  positive  results  mentioned  above  are  illustrated  by  the  data 
in  the  following  two  tables.  In  each  table  a  box  represents  100  Monte 
Carlo  runs  at  the  particular  I,  J,  K  and  CTq,  <t^,  cr^,  combination. 

The  number  in  the  upper  left  of  each  box  is  the  percentage  of  runs  on 
which  convergence  occurred  and  the  number  in  the  lower  right  is  the 
average  number  of  iterations .  per  run  for  the  runs  represented  by  that 
box.  Table  5*8.1  illustrates  That  occurs  as  n  becomes  large.  Two 


Percentage  of  Runs  on  -which  Convergence  Occurred  and  Average 
Number  of  Iterations  Per  Run  on  Sets  of  100  Monte  Carlo 
Runs  with  'various  I,  J,K9cr0  ,cr2  ,a-3  Combinations 


3.65 


3.23 


5.18 


3. 71 


3.9^ 


Table  5.8,2 

Percentage  of  Runs  cn  which  Convergence  Occurred  and  Average 
Humber  of  Iterations  Per  Run  on  Sets  of  100  Monte  Carlo 
Runs  with  Various  I4  J,K,ct0  ,a1  ,cr2  >CT3  Combinations 


.  a0  1 

\  °i  1 

\  3 

I.  J.K\  CTo  2 


100.0 


10 

10 

10 

10 

3 

1 

10 

7 

30 

30 

10 

4o 

2 

2S 

10 

?n 

99.0 

99-0 

99.0 

1  97.0 

Average 


8.55  8.20 


98.0 


10.59 


98.0 


6.46  id.  74  8.57 


100.0 


99-0 


7.71 


98.0  99-0 


11.34 _ 7.15 _ __  8._83 .. 

97.0  loo.o  98.4 


10.9^  7.73  1 

96.0  100.0  98.4' 


6.96  5.59  5.55 

99.0  '  100.0  100.0 


5.35  '  5.02  4.53 


100.0  i 100.0  -1100.0  98.0 


9-92  , _  5.93  6^77 

100.0  199.0 


4.91  5.34 


100.0  1  99.6 


Average 


4.74  4.46  |  4.26  5.84  5.06  4.87 


97.3  99-5  9 


9.46  6.55 
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different  I,  J,  K  combinations  were  studied — one  ■with  medium  n=6o  and 
one  with  large  n=54o.  For  large  n  the  percentage  of  runs  converging 
was  100$  and  the  average  number  of  iterations  was  only  2.6l.  (For  the 
same  o^,  cr^  ,  ov,,  cu  combinations  with  n=6000,  the  average  number  of 
iterations  was  2.0. the  absolute  minimum.)  It  is  easily  seen  that 
seme  Cq,  o^,  Og,  combinations  are  more  difficult  than  others  (in 
terms  of  number  of  iterations  and  to  a  smaller  extent  in  percentage  of 
runs  converging)  but  that  no  combination  is  too  difficult.  This  is 
probably  accounted  for  by  the  fact  that  n  is  not  small  in  either  case. 
However,  note  that  for  all  a  combinations,  there  is  a  marked  improve¬ 
ment  upon  going  from  the  medium  n  to  the  large  n  case. 

Table  5-3.2  Illustrates  results  for  several  cases  of  small  n  (k  4o). 
It  becomes  apparent  that  there  are  no  startling  differences  among  the 
various  I,  J,  K  combinations  although  there  seems  to  be  a  trend  toward 
doing  better  if  instead  of  n=I<IK,  only  IJ  is  considered.  (This  leads 
to  a  suspicion  that  it  may  not  be  n  but  the  nu  that  are  really  important.) 
Increasing  K  did  not  seem  to  help.  The  trend  is  most  easily  seen  in 
the  decrease  in  the  average  number  of  iterations.  Although  there  were 
not  many  differences  in  I,  J,  K  combinations,  there  were  noticeable 
differences  among  the  Cq,  cr^,  o^,  ou  combinations.  Those  for  which 
is  not  small  relative  to  Og  and  ov  (the  first,  second,  and  fourth)  seem 
to  do  worse  than  the  others  both  in  terms  of  percentage  of  runs  converg¬ 
ing  and  average  number  of  iterations.  This  agrees  with  the  ideas  advanced 
above.  When  the  estimates  were  to  be  positive  the  convergence  was  swift 
and  sure.  One  other  pleasant  aspect  of  Table  5-8.2  Is  that  even  for 
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these  small  designs  the  overall  average  percentage  of  runs  converging 
was  98. 87b . 

As  a  summary  of  the  Monte  Carlo  results  the  following  can  he  said. 

It  has  not  yet  been  proved  in  a  theorem  that  any  of  the  desirable 
properties  mentioned  in  Section  5-4  are  true  for  The  Iterative  Procedure. 
However,  the  following  have  been  demonstrated. 

1)  The  method  usually  converges  (99+$  of  the  time)  and  when  it 
does  it  converges  to  a  solution  of  the  likelihood  equations. 

2)  As  the  size  of  the  design  gets  larger  the  method  becomes 
more  computationally  efficient  and  even  for  small  designs  it 
is  quite  efficient. 

3)  If  the  parameters  being  estimated  axe  configured  well  in  the 
sense  mentioned  above,  as  they  often  are  in  real  data,  then 
the  method  is  more  effective. 

4)  The  method  does  have  a  problem  with  negative  estimates  which 
can  be  partially  overcome  by  the  methods  mentioned  above. 

In  short,  although  it  is  not  perfect.  The  Iterative  Procedure  seems  to 
be  a  highly  efficient,  ef feet ive . algorithm  for  solving  the  problems  it 
was  intended  to  solve. 
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CHAPTER  6 


EXAMPLES 


6.1.  An  Example  of  Application  of  Asymptotic  Theory — The  Two-Way 
Crossed  Balanced 'Random  Effects  Model 

The  model  used  here  is  given  first  in  the  form  conventionally  used 
in  the  analysis  of  variance. 


Ajk  =  “  +  ai  +  ”3  +  cij 


+  eijk’ 


i— l,2j. . . ,1, 

j=l,2, « . . ,  J, 
k=l,2, . . .  ,K. 


Here  p.  is  the  overall  mean  effect  and  the  a’s,  b’s,  c’s  and  e’s  axe 
random  variables.  How  list  the  y's  in  lexicographic  order  (see  example 
below)  as  a  vector  to  get  the  following  model. 

Z  =  »  +  2x5,1  +  +  e, 

where 

jjr  is  nXl,  n  =  IJK, 

X  is  an  nXl  vector  of  I's, 
a  is  a  lxl  unknown  constant, 

is  an  nXlJ  standard  design  matrix  for  the  AxB  interactions, 

U2  is  an  nXl  standard  design  matrix  for  the  A  effects, 

U_  is  an  nXJ  standard  design  matrix  for  the  B  effects, 

~3 
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b^  is  an  IJXl  random  vector  containing  the  AxB  random  interactions 
bg  is  an  1X1  random  vector  containing  the  A  random  effects, 
b^  is  a  JX1  random  vector  containing  the  B  random  effects, 
e  is  an  nXl  random  vector  containing  the  errors. 


The  parameters  and  constants  in  this  model  and  their  correspondence  to 
the  usual  analysis  of  variance  set  up  are  given  below.  (AITOVA  stands 
for  analysis  of  variance.) 


Parameter 

Of 


o 


3 


Corresponding 
MOYA  Parameter 

2 
a _ 


Corresponding 

Constant  AITOVA  Constant 


n 


IJK 


P0 


1 


3 


P 


5 


*1 

“2 

“3 


IJ 

I 

J 
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The  actual  form,  of  the  likelihood  equations  -will  be  given  later 
in  this  section.  At  this  point  jr,  X  U^,  U^,  for  the  case  1=2,  J=3, 
K=2  are  given  as  illustrations. 
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The  matrices  for  i=0,2,3  are 
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The  asymptotic  theory  for  this  model  ■will  now  be  set  up.  Let  I 

and  J  both  approach  infinity  in  such  a  way  that  —  p,  0  <  p  <  K  is 

d 

fixed.  Then  the  matrix  of  the  p. .  defined  in  Section  4.2  is 
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0  1 

1  K 

f  1 
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It  can  be  seen  by  inspection  of  the  above  matrix  that  c=2  and  the  sets 

S  ,  s=0,l,2  are  S  =  {0} ,  S  {l},  S  =  {2,3},  -where  c  and  the  S  are 

S  U  d.  s 

defined  in  Section  4.2.  Then  it  can  be  seen  that 

rank(Un:U 0:U0)  =  IJ  , 

rankCUg-.U^)  =  I  +  J-l  , 

rank(tJg)  =  I  , 

rank(U_)  =  J  . 

~3 

Therefore 

vQ  =  n  -  rank^Ug-.^) 

=  IJK  -  IJ 

=  IJ(K-l), 

v.  =  rank(lL  :U0:U0)  -  rank(U0:U_) 

=  u  -  (i  +  j  -  i) 

=  i  j  - 1  -  j  + 1 

=  (i-i)(j-i), 

} 

v_  =  rahk(U0:U0)  -  rank(U0) 

2  ~2  ~3  ..  ~3 

•=  I  +  J  -  1  -  J 

=  1-1, 

=  rahkCUg-.U^)  -  rank(Ug) 

=  I  +  J-1-I 
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The  columns  making  no  H_,  H, ,  and  H-,  are  noted  under  the  matrix  P.  It 

—  ^ez.  ~ 

It  is  easily  seen  that 


and 


rankO^)  =  IJ(K-l)  =  2*3*1  =  6, 
rankQ^)  =  (l-l)(J-l)  =  1*2  =  2, 
rankCHg)  =  I+J-l  =  2+3-1  =  4. 


It  is  also  easy  to  verify  that  the  following  matrices  are  zero  matrices. 

.S&>-2>  2&>-2>  2§*o  =  2> 

2iSo  =  2’  SgSo  =  2’  2^o  =  2’ 

=  0,  U'lL  =  0  ^ 

SgSi  =  2’  2^1  =  2- 

The  above  illustrates  the  concepts  in  Section  4.2  for  this  case. 

Now  proceed  to  actually  write  down  the  likelihood  equations  for  this 
model.  It  turns  out  that  all  the  CP  matrices  can  be  diagonalized  by  a 
single  orthogonal  matrix  and  that  X  is  a  characteristic  vector  of  each 
G^.  As  noted  in  Section  5.33  this  greatly  simplifies  the  likelihood 
equations.  In  fact,  (ly  x  2j  x  ^  >  2l=  (2l  X  h  X  %«) 5 

G  =  (lT  X  E_  X  EL.) ,  and  G  =  (E_  X  I_  X  E„) ,  where  X  denotes  the 
MJ  ml  ~3  —I  mJ  ml 

Kroenecker  or  direct  product  of  two  matrices  and  1^.,  1^.  and  are 

identity  matrices  and  E^,  Ej  and  E^  are  square  matrices  whose  elements 
are  all  1  of  sizes  I,  J  and  K  respectively.  The  matrix  that  diagonalizes 

I. 

\. 
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them  all  simultaneously  is  P=(P_  X  P_  x  P.,)  -where  P_,  PT  and  PT,  are  any 

'-w'  /s-0  'MY 

orthogonal  matrices  with  first  column  proportional  to  a  vector  of  ones 
of  size  I,  J  and  K  respectively. 

Using  the  above  diagonalization  and  the  fact  that  X  is  a  character- 
istic  vector  of  each  G.  it  can  he  shown  that  the  estimate  of  or  can  be 

-VI 

made  independent  of  the  estimate  of  cr.  The  likelihood  equations  are 
written  using  the  following  terminology  from  the  analysis  of  variance. 


,  I  J  K 

y  =  ttk  E  E  2  y-Mv 

m  i=!  j=!  M 


1  J  K 

y-3-  =  «  }l%  J,  yijk 


I  J  K 

SS  =  S  s  2  (yiik-  y  )‘ 

i=l  3=1  k=l  3  *  *  * 


_1  J  K 

yi..  JK  1S_yi3k 
0=1  k=l 


yii 


1  ^ 

K  k=1  yi0k 


I  J  K 

SS  =  X  X  X  (y  -  y  )‘ 
i=l  j=l  k=l  13  * 


SS  =  JK  X  (y  -  y  )‘ 

•  •  •  • 


SS  =  1  X  (y  •  y  Y 

■D  •  t  J  •  ••• 

cJ— 


I  J 

SSA*  =  K  2  S  (y  -  y  -  y  .  +  y .  )£ 

■'"J  ■  -L»  •  •  J  •  ••• 


The  likelihood  equations  are 


_ IJK _ 

( a0  +Kax  + JKo2 +IKct3 )  a 


IJK _ 

(  CT0  +KOJ  +  JKcr2  +  IKcx3  )  y. . .  ’ 
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1-1 


J-l  .  (1-1) (J-l)  IJ(K-l) 


ct0  +Kcrx  + JK<t2  +IKc3  CToTKCTi+JKo-g  ct0+Kct1+IKct3  cr0+Kax 


SSA  ss„ 

_  A  _ B 

(a0+Kax+JK(72  )2  (a0  +KCT1+IZa3  )a  +  (ct0+Kctx  )'d  +  Gq  ’ 


SSAB  .  SSe 


_ K _ _  +  K(J-l)  +  K(l-l)  ,  K(l-l)  ( J-l) 

ct0  +Kct x  + JKcr2  +TK<j3  a0 +Kax +<JKct2  <j0 +Kox  +  JKo3  T  cr0+Koi 


k(ssa) 


k(ssb)  kCss^) 


(G0+KCTi+JXG2  )a  +  (ct0+Kgi+j_K:ct3  )y  +  (o0+ZosyA  ’ 


_ JK _  JK(l-l)  =  JX(SSA) 

G0  +Kax  + JKa2  +IKCT3  G0+Kax+JXCT2  ( ct0  +Kctx +JKct2 ) b  5 


IK _  XK(  J-l)  _  IK(SSb) 

CT0+KCTx+JKcr2+IKa3  G0+Kax+IKCT3  ( a0  +Kcr  x  +IK03 )  ^ 


No  closed  form  solution  exists.  The  iterative  solution  is  given 
in  Section  6.2.  The  matrix  J  defined  by 


(J).  .=  lim  k(- 

n.n.  L  0\ 


SX(Xs0) 


1  0 


B0ia93  grSo 


^~|  is  calculated  in  the  following 


manner. 
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It  is  true  that 


•y/y“ly  _  _ IJK _ 

~  ~  aoo+Koi+JXaos+IKcTos  • 

Therefore 

_1_  /y-L  1  #  _ MK _ 

2  ~  ~0  ~  I  ct00+Kct01+JK02+IKct03 

4 

^  1 

®02  +  P  °b  3 

How  C.,  whose  ( i ,  )  “^  element  is  given  by  A  lim  — — —  tr  E^G .  E_ ^G . , 

~1  \  &  j  2  n.n.  ~0  ~q~0  ~j’ 


i3j=03la2j33  crust  be  calculated.  The  following  table  will  suffice. 
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-4-  tr  E^G.E^G.  Lirtl- 
n .  n  .  '*•0  ^.i 

i  3 


1  -1  -T 

of  -i-  tr  E  G .  E  ~G . 
n. .  n. .  ~i'*0  ~£ 

i  3 


0  l  - i - rf - * - - 

[IJ(K-l)  (1-1)  ( J-l)  ]2  ^ ( CT0 o  +Kcx0 j_  +  JKct0 g  +IKa, 3  )2  (K-l)2(a00+Ka01  )2 


+  K(I-1)  +  K(J-1) 

(-oo+Koi+JKc^)2  (cr00+KCT01+l£a03)2 


(ctoo+K0o1)2 


[IJ(K-l)  (l-l)]2  "  (CToo+Kaoj+JKcTog+IKa^}2 


JK(l-l)  1 

( °0 O  tKOq  1 +  JKO"q3  ) 2 


o  3 


[IJ(K-1)(J-1)]2  L  (aoo+Koi+JKaos+IKao 


IK(J-l)  1 

(  C7C  0  ~Ko0 1  +IKcTq  3  )  2 


1  1 


(CT00+^-Obl+'^-O’02+-®<J3  )* 


,0-oo+KOoi  . 


K2(I-1^ 


k2(j-D 


(cT00+Ko-01+JlCooa  )2  (o-00+Kct01+±*Cct03  ): 
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1  3 


1  tr  Z^G.Z'^g. 


xi .  n  .  ~i 

1  J  J 


Limit  of  tr  Z'h.Z^G. 

n.n .  /vi/vO 

i  j 


,  K  (I-1)(J-1)  ~| 

(ct00+Kct01)2 


1  2 


t[- 

i?  L  / 


<JK 


[ ( 1-1) 2 ( J-l) 3 2  L (o0 o  +Ka0  a  +JKct03  +IKa0 3 ) 2 


+ _ ] 

( Oqo "r'^crO  1  +  JKCq2  )  3 


l  3 


i — *[■ 

\  /  T  i\2l?  L 


m2 


[(I-1)(J-1)2]2  L(a00+Ko01+JKCT03+IKa03)J 


(  °b  o  ‘  Ko-o  i  +LKOQ  3 ) 


] 


2  2 


A2 


i  r _ 

(I_1)  L ( ct00  +Kct0 x  +  JXa03  +IKa0 3 )3 


j2k2(i-i) 


(c700+Ka-01+JTCCT0S  )2 


] 


2  3 


[■ 


IJK 


[(i-i)(j-i)]2  L  ( G-00+Kct0  1 +  2  3  ) 


rl 


3  3 


2  2 
IT 


(^co  +K^o  i  +JK(?o  2  +IKa-0  3 ) 2 


0 


0 


0 


rolH 
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i  J 


1  tr  E^G.E^G.  Limit  of  tr  E^G.E^G. 

^  '  -  n.n .  rjQ  ~i~0 

i  J  d 


n.n.  ^  Xo  Xi^o  Xj 
i  .1  0 


e2k2(j-i) 


(  cr0  o  +Kct0  i  +LKct0  3  )2 


] 


Thus  it  follows  that 


r  i  .  i  i 

1 

r  k 

L  (K-l)  (cr00+Kcr01  )aJ 

2 

-(K-1)2(ct00+Kct01)2 

r .  K  1 

1 

r  k2  -i 

(K-1)2(ct00+Kct01)2 

2 

(ct00+Ko01  )2 

0 

0 

0 

0 

] 


0  0 


0  0 


1  1 
2  °02 


0 


2  0q3 


Hence 
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_  -EsEjXj— 

(K-l)% 


0 


-l 

t  — 

ul  ' 


— -ffna., _ 

(K-l)% 


2 of,  +•  2  -<■■  +  4  '^nn<7n'’ 

01  k(k-1)  h  K 


0 


0 


2cf2  0 


0  2C5q  3 


The  C^1  obtained  above  may  be  compared  with  the  asymptotic  covariance 

matrix  obtained  -when  the  usual  analysis  of  variance  estimators  are 
normalized  by  the  same  normalizing  sequences.  Recall  that  the  usual 
analysis  of  variance  estimators  are 


r*J 


ss 

e 

~  1 

rssA 

SSAB  1 

H 

C-l 

“R 

1 

v» 

°2  JK 

L(i-i) 

(I-1)(J-1[  J 

i  r  ssab 

SS 

e 

o  -A. 

rssB 

1 

K  L(I-1) (J-l) 

IJ(K-l) J5 

3  IK 

L(j-iy 

H 

1 

H 

1— 

and  that 


SSo  ~  ^00  Xtt^ 


00  xI-J(K-l)  ’ 


2 

SSA  ~  (c7oo+KCTOl+JKa,02)  X(I_1) 


2  2 
SSAB  ~  (°bo+Kobi)  X(I_1)(J_1),  SSB  ~  (aoo+Koi+IKoos) 


9 
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and  that  all  sums  of  squares  are  independent.  One  discovers  using  the 

--  -1 

above  that  the  asymptotic  covariance  matrix  of  the  cr’s  is  just  C 1 . 

Thus  the  maximum  likelihood  and  the  usual  AITOVA  estimates  are  asymptoti¬ 
cally  equivalent  in  this  case.  ITote  that  in  either  the  usual  AITOVA 
situation  or  the  maximum  likelihood  situation  had  an  attempt  been  made 
to  normalize  each  estimate  by  the  same  sequence  (for  instance  n2), 
something  would  have  gone  wrong  no  matter  what  sequence  was  tried. 

That  is,  one  of  the  asymptotic  variances  would  have  been  zero  or 
infinity. 

To  make  this  example  totally  complete  would  require  a  demonstration 
that  the  various  lemmae  in  Chapter  4  are  true  in  this  case  (which  can 
be  done  in  a  straightforward  manner  by  brute  force) .  This  will  not  be 
done  here.  It  has  been  shown  how  the  concepts  of  Chapter  4  apply  in 
this  case.  In  the  next  section  the  actual  computation  of  the  estimates 
is  illustrated  for  this  model. 


\ 
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6.2.  Example  of  the  Iterative  Procedure — The  2-Way  Crossed  Balanced 
Random.  Effects  Model 

The  specifications  for  the  model  and  the  likelihood  equations  have 
already  teen  developed,  in  Section  6.1.  The  equation  for  a  may  he 
solved  independent  cf  the  estimates  of  the  cr.’s  yielding  a  =  j 

X  •  •  • 

The  equations  needed  to  iterate'  and  solve  for  Sq,  o^,  are  of  the 

form.  £(2{k)  ^£(k-rl)=  ~^CT(k)  ^  "wiiere  S/£(k)^  a  ma'fcr:i-x  'whose  elements 

fk) 

are  given  in  Table  6.1.1  -with  a:  1  substituted  for  cm.,  i=0,l,2,3  and 

l  Oi 

c(£(k))  is  a  4x1  vector  consisting  of  the  right  hand  sides  of  the  like- 

fk) 

lihood  equations  given  in  Section  6.1  with  on  ;  substituted  for 
cm,  i =0,1,2, 3.  After  much  algebraic  manipulation  these  equations  can 
be  solved  to  yield  the  following  iterative  equations.  (Define 

ss  ss  ss  ss 

MSe=  ij(k-i)  ’  mA~  (I-l)  ’  msb=  (J-l)  ’  (I-1)(J-1)  •  ^ 


4k+1>  =  MS 
0  e 


?+1>=  I  + 1  [<CTok)+K4k))2]  •  K -  ®b  -  "v 


4k+l!=  A  [“a-^Ab]  -  i  [(^k,«4k))2+(J-l)(^k)+Ka'k)+JKa(k))2] 


®A  +  -  *  J 
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Observe  that  at  each  stage  the  new  iterated  estimator  is  the  usual 

analysis  of  variance-method  of  moments  estimator  plus  a  correction  term. 

The  correction  term  is  a  product  of  the  quantity  MS.+MS_-MS  and  a 

A  3  Ad 

term  -which  depends  on  the  old  estimates  -which  were  iterated.  If  in 


fact  MSA+MSB-MSffi 


is  zero  then  there  is  no  correction  and  the  maximum 


likelihood  and  usual  estimates  coincide.  The  reason  for  this  is  that 


If  MS^+MSg 0,  the  likelihood  equations  admit  an  explicit  solution 

and  therefore,  as  was  shown  In  Section  5*5?  the  iterative  process  must 
. '  '  '  * 
yield  this  exact  solution  in  one  iteration.  This  procedure  is  easy  to 

program  on  a  computer  or  programmable  calculator  and  may  be  used  to 

easily  compare  the  two  types  of  estimates. 

No  closed  form  expression  is  possible  for  the  maximum  likelihood 

estimators  so  their  behavior  cannot  be  studied  directly  in  this  case. 

A  ca^e  -where  explicit  solutions  do  exist  and  -where  the  behavior  of  the 
* 

estimates  can  be  studied  directly  is  presented  In  the  next  section. 
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Equations  Exist — The  Two-Way  Balanced  Nested  Layout  as  a  Mixed  Model 
This  model,  given  in  the  usual  analysis  of  variance  notation,  is 


y .  =  a  .  +  b  .  .  +  e .  ... 

13k  i  10  10k 


i=l,2,...,I;  j=l,2,. ,.,J;  k=l,2,...,K; 


■where  y.  is  the  observation,  a .  is  an  unknown  fixed  effect  and  the 
i0k  i 

2 

are  independent,  identically  distributed  as  /?(0,Og)  and  the  e^^ 

2 

are  independent,  identically  distributed  as  7?(0,a  )  and  the  b. .  and 

®  10 


e^^  are  independent.  Listing  the  y?  s  in  lexicographic  order  as  in 
Section  6.1  the  following  model  is  obtained. 


y  =  'Xbr  +  ILb.  +  e, 

where 

£  is  nXl,  n=IJK, 

X  is  an  nXI  standard  design  matrix  for  A  effects, 
oi  is  an  Ixl  vector  of  unknown  constants, 

U,  is  an  nXlJ  standard  design  matrix  for  interactions, 

'Vl 

b.,  is  an  IJXl  random  vector  containing  the  random  effects  for  this 
~L 

model, 

e  is  an  nXl  random  vector  containing  the  errors. 

l"V/ 

The  parameters  and  constants  in  this  model  and  their  corresponding 
parameters  and  constants  in  the  usual  analysis  of  variance  set  up  are 
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!  i  o  o  o  o  o 


I  1  0  0  0  0  0 


i  0  1  0  0  0  0 


!  0  1  0  0  0  0 


0  0  1  0  0  0 


i  0  0  1  0  0  0 


0  0  0  1  0  0 


0  0  0  1  0  0 


0  0  0  0  1  0 


0  0  0  0  1  0 


0  0  0  0  0  1 


0  0  0  0  0  1 


It  can  be  shown  that  G^  and  G^  are  diagonalizable  and  that  X 
consists  of  characteristic  vectors  of  G_  and  G,  with  each  column  of  X 
having  the  same  characteristic  value.  Using  this  information  it  is  easy 
to  show  that  the  likelihood  equations  are  as  follows.  For  a, 

(x'l,  “bc)ar  =  X/E  reduces  to  (a^+Ka, )  ^JKy.=  (ou+Kcr.. )  ‘'"JKy.  , 

i=l,2,...,I  (y.  defined  in  Section  6.1)  which  yields  a  .=  y. 

1  •  •  1  1 9  m 

independent  of  the  choice  of  aQ  or  a^,  ±=1,2, , 


1.  Let 
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1  J  2 
SS  =  K  E  E  (y  -  y  f. 
B  j=]_  -*-«•  x*  * 


Then  the  equations  for  and  are 


IJ 

(a0+Kaiy 


IJ(K-l)  _ 


SS. 


B 


J0  (ffQ+Ktr^' 


SS 


0 


IJK 

Toq+^7 


k(ssb) 

(ctq+Ko^2 


r 


It  is  easy  to  solve  these  equations  to  obtain 


SS 

CT0  =  IJ(K-I)  =  ^e 


To  show  that  the  iteration  )S(1s;h-1)=  Y  fl16812 

solutions  in  one  iteration,  set  up  B  )  and  c(cr^))  as  follows. 


S<2(k)) 


IJ 


IJ(K-I) 


73^3^7  W) 


IJK 


IJK 


IJK 

(a^H-K a<k))2 
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(k)  fk)  (k) 

Solution  of  these  equations  for  any  cr^  '  and  or  1  -where  cr^  >  0  and 
o^+Xcr^  t  0  yields  exactly  and  given  above. 


The  asymptotic  theory  for  this  case  is  easy  since  the  non- 
asynpbotic  distributions  of  all  the  estimators  are  known.  It  is  well 

.  c..  2  2 

known  that  SSe  ~  tr  XIJ(K_1)  >  SSB  ~  (cr00  +  Kct01)  Xj(j 

<5  ~  s  -^nn  ^i) ,  and  that  £  is  independent  of  both  SS^  and 

SS  .  From  this  it  follows  that  a  and  ou  are  unbiased  but  that  ct,  is 
e  ~  0  1 

biased  with  expected  value  =  (l  +  -^)cr01  "  °oo  •  Furthermore, 

the  variance-covariance  matrix  of  (a ' ,  a0,  a1)/  is 
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Two  cases  of  asymptotic  behavior  may  now  be  considered.  In~bcth 
cases  I  must  remain  fixed  because  it  is  the  number  of  fixed  parameters. 
Since  this  is  true,  J  must  become  infinite;  otherwise  m^=  IJ  will  not 
become  infinite.  The  two  cases  then  will  be  K  fixed  and  K  becomes 
infinite.  To  be  sure,  nothing  is  gained  if  K  becomes  infinite  but  it 
does  introduce  items  of  pedantic  interest. 

Case  I:  K  fixed. 

It  is  obvious  either  by  examination  of  the  variance- covariance 
matrix  or  by  a  return  to  definitions  that  the  correct  normalizing 
sequences  are 

nQ  =  [IJ(K-l) A 
n-L  = 

n2  =  [JICF. 

Then  J  the  limiting  covariance  matrix,  becomes 


|(o-00+Kc5-oa  )l1  0 


0 


K(K-1)2 


gacn.  .-x 
K(K-1)2 


2  r 


K 


2  L’ 


CCT00+Kcr( 


01 


)2+ 


This  matrix  can  be  obtained  either  directly  from  the  finite  covariance 
matrix  or  by  the  definitions  of  Section  4.3.  In  either  case  the  above 
matrix  is  obtained.  Note  that  n.  and  nn  are  of  the  same  order  of 
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magnitude  in  this  case.  However  in  the  second  case  this  will  not  be 
so. 

Case  II:  K  becomes  infinite. 

'In  this  case  the  correct  normalizing  sequences  are 

nQ  =  [IJ(K-1)]2 

nx  =  [IJF 


1 


The  limiting  covariance  matrix  is  then 


In  this  case  nQ  and  n^  are  not  of  the  same  order  of  magnitude  and  so 

the  asymptotic  covariance  between  and  6^  becomes  zero.  In  case  I, 

j  l.  JL 

normalizing  all  estimators  by  n2=  [IJK]2  could  work  but  in  case  II  it 
would  net  work  at  all. 

In  both  cases,  the  maximum  likelihood  estimates  are  asymptotically 
equivalent  to  the  usual  analysis  of  variance -method  of  moments  estimates 
as  can  easily  be  verified. 
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6.4.  Example  of  an  Unbalanced  Layout- -The  One-Way  Unbalanced  Random 
Effects  Model 

The  model,  given  in  the  usual  analysis  of  variance  notation,  is 

Y±.  =  n  +  a.  +  ei;j,  5=1,2,...,  Ji#  i=l,2,...,I; 

where  y.  .  is  the  observation,  u,  is  an  unknown  mean,  the  a.  are 

3  ^  5  i 

2 

independent,  identically  distributed  as  7[( 0,0.)  and  the  e. .  are 

A  10 

2 

independent,  identically  distributed  as  71(0,0^)  and  the  a^  and  e_  are 
independent.  Listing  the  y’ s  in  lexicographic  order  the  following 
model  is  obtained. 

y  =  Xh  +  U  b  +  e, 

where 

y  is  nXl,  ^n  =  2  J^, 

X  is  an  nXl  vector  of  ones, 

<v 

a  is.  a  1x1  unknown  constant, 

is  an  nxl  standard  design  matrix  for  this 
model, 

bn  is  an  Ixl  vector  containing  the  random 

~lL 

effects, 

e  is  an  nxl  vector  containing  the  errors. 

The  parameters  and  constants  in  this  model  and  their  corresponding 
analysis  of  variance  parameters  and  constants  are 

l 

* 
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Parameter 


Corresponding 
AITOVA  Parameter 


a 


Constant 


Corresponding 
AITOVA  Constant 


n 


E  J. 
i=!  1 


^0 

P1 

P 

”1 


1 

1 

3 

I 


The  case  1=2,  ^=2,  J2=3  is  illustrated. 


1 - 

’  l 

1  0 

y13 

i 

1  0 

■ 

y21 

,  X  = 

l 

,  un  = 

0  1 

<M 

l 

0  1 

1 - 

£ 

U) 

i 

i 

m  — ' 

0  1 
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S  y .  . 

=  JuIi 


^  yi.  =  j. 

l 


.  The  likelihood  equations  are  as  follows.  For  a 


the  equation  is 


r  1  n  1  r  J-y-  1 

f-ifi  <Wi)J  “  =i=i  Uv^l)2- 


For  <Jq  the  equation  is 


J. 


i  ,  ('Ji-1)' 
« ffo  ■ 


l  r. 


i  rJi(yi  -®)2  ^  -i 

=  z  !  -1  — =■  +  — -  ! 

i=i  '-(vJi°i)  •  co 


For  the  equation  is 


J  r  J1  1  l 

ill  LTvW  \h  J  ■ 

v  0  i  1 


These  equations  are  very  messy.  They  must  he  solved  simultaneously 
for  a ,  Cq  and  and  no  real  simplifications  are  possible.  The 
iteration  equations  are  given  by 


_T 


2(k+l)  *  l  (2(k)>  £[2(k)>  i£{k))]’  k=1>2’"- 


where  B(o^^)  is  the  following  2x2  matrix 


llfO 


£(2(k))  = 


2  f 

*  -i  V*r 


3*1 


3*1 


(J,-i) 

+  - -si 


K  ] 


j. 

. .  „r  ■(■krTj3"(k)",2i 
1=1  [cJq  '+J.ct£  ] 


(k)n2  J  A  17  (k).  J’jkhaf 


1  I  ^  1 

ITlkT^lkT^J 
J=1  fCTo  +Vi  ] 


and  cjo^ ,  a (a^) )  ]  is  the  following  2x1  vector 


2fe(k)’  £<2{k))]  ■ 


I  fJjt^.-orla^)]2  y).)  x 

A^Ao®']2 .  [aw]*""* 


A 1 1 


and  where 


“(‘W 


r  Jiy.i.  i 
t  U)  :  '  kTTJ 


A 

X  i*  J .  ^ 

.2  L  (k)  J  (k)J 
0=1  [cjJ  +Jf{  ] 


These  equations  are  iterated  until  convergence  is  obtained.  It  is 


easily  seen  that  it  would  he  difficult  to  perform  these  iterations  by 
hand  but  easy  to  program  them  on  a  computer.  A  computer  program  which 


handles  the  completely  general  case  is  described  in  Appendix  C. 


6.5.  Example  of  a  Sequence  of  Designs  not  Satisfying  Assumption  4.2.4 
Recall  that  Assumption  4.2.4  states  that  for  every  i  and  every 
j  «  S  ,  j  +  i  'where  i  e  S  ,  there  exist  two  nonnegative  constants  R 

o  S  _L 

and  Rg  both  less  than  or  equal  to  one  such  that 


”i  u<3>  J1 

J  !~l  2k  V 

/.  v  Tii7  m 

^  Si  Si 


S  R- 


f  n 

for  all  but  R^nu  values  of  k  in  the  set  {l,2, . . .  ,m./] ,  -where  u^  is  the 

column  of  U.  and  is  the  column  of  U. .  Furthermore,  R,  and 

-'■k  ~-a  ’1 

are  such  that 


*1  + 


(1  -  R1) 


< 


1 

N(Ss)+l 


where  H(Sg)  is  the  number  of  indices  in  the  set  Sg. 

The  following  is  an  example  of  a  sequence  of  designs  ruled  out  by 
this  (assumption.  It  is  given  to  illustrate  that  design  sequences 


eliminated  by  this  assumption  are  not  design  sequences  that  would  be 


interesting  in  any  event.  Let  p^=  2  and  let  U^,  nXm^,  be  any  legitimate 

design  matrix  (i.e.  exactly  one  1  in  each  row)  with  at  least  two  l*s 

in  each  column.  Construct  Ug  nx(m^+l)  as  follows.  For  each 

j=lj2, . . .  ,23^  the  (j+l)^  column  of  Ug  is  just  like  the  j^*1  column  of 

U,  with  one  execution.  The  last  1  in  the  column  is  made  a  zero  and 
~1 

that  1  is  nlaced  in  the  same  row  in  column  1  of  U_.  For  instance,  a 

■> 

typical  U^  and  Ug  might  be 


- 1 

H 

O 

0  10 

1  0 

0  10 

1  0 

10  0 

0  1 

& 

11 

00  1 

0  1 

0  0  1 

1- 

10  0 

Now  since  m^+1,  m^  and  mg  clearly  have  the  same  order  of  magnitude 

if  a  sequence  of  U^.'  s  and  corresponding  Ug '  s  constructed  by  the  above 

method  is  considered.  Furthermore  v^=  rn^-1  and  Vg=  m^=  mg- 1  by  the 

method  of  construction.  That  is,  Ug  has  been  constructed  so  that  the 

columns  of  U-,  and  Ug  have  no  linear  dependencies  except  the  one  forced 

by  the  constraint  that  there  is  one  1  in  each  row.  Thus  Assumption 

4.2.3  is  satisfied.  However,  Assumption  4.2.4  is  not.  Let  ttfu,  =  D, . 

~±~1  ~1 
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and 


u. 


(2),  (l) 


(2) '  u.(1>  =  d^-  1 


S  =1>S  s 


,(2)'  „(D  - 


'ij  '  '  Sk  -  °>  ?*2> 3,...,m2,  k  . 


But  then  for  all  k=l,2,. . . ,m^ 


"2  r 

A  ^ '  E 


i  f.  r^-h2 

.aFJ  L^rrJ 


i+Ca^l2-  241,+1 


=  i  - 


(i) 

This  is  a  monotonically  increasing  function  for  d/  '  s  2,  so  its 

minimum  occurs  for  d*^  =  2  and  gives  the  hound  (-since  d^  s  2  for  all 
k=l ,  2 , . . . ,  m^  by  definition  of  this  problem) 

m2  u(?),u(l)  2 

Y  (Z*  A  \  >  ,  2  ,  _2_  _  1  1  - 

j2=i  Mr  ar  2  1 


But  Assumption  4.2.4  demands  that  at  least  for  a  proportion  of  the  k. 
that  the  quantity  in  question  he  less  than  or  equal  to  R^.  Clearly 
^en  is  impossible  to  choose  any  nonnegative  R1  less 
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Thus  Assumption  4.2.4  cannot  be  satisfied  for  such  a  design 
sequence.  But  such  a  sequence  of  designs  would  never  be  of  interest 
in  any  case  since  almost  all  treatment  combinations  overlap.  In  general, 
cases  that  get  ruled  out  by  this  Assumption  4.2.4  are  not  of  practical 
interest.  All  reasonable  experimental  design  sequences  pass  this 
assumption. 

Another  interesting  fact  is  that  some  sort  of  assumption  like 

Assumption  4.2.4  is  necessary  for  Theorem  4.4.1.  This  is  true  because 

it  can  be  shown  that  when  th  above  is  a  standard  design  matrix  for  the 

'xJL 

row  effects  in  a  one-way  balanced  layout  where  the  number  of  ones  in 
each  column  goes  to  infinity,  then  the  matrix  J  needed  for  Theorem  4.4.1 
is  singular.  But,  as  shown  above,  Assumption  4.2.3  will  still  hold  for 
this  case.  Thus  some  stronger  assumption  than  Assumption  4.2.3  is 
needed  for  Theorem  4.4.1.  Assumption  4.2.4  is  such  an  assumption  and 
as  seen  above  does  not  rule  out  any  design  sequences  of  consequence. 


CHAPTER  7 


SUMMARY 

7.1.  Summary 

The  results  cf  this  paper  concerning  maximum  likelihood  estimates 
in  the  mixed  model  of  the  analysis  of  variance  can  he  divided  into  two 
areas— asymptotic  theory  and  computational  procedures.  It  was  proved 
that  under  quite  general  assumptions,  the  maximum  likelihood  estimates 

were  consistent  and  asymptotically  efficient  in  the  sense  of  attaining 

/ 

the  Cramer-Rao  lower  hound  for  the  covariance  matrix.  A  computational 
procedure  was  proposed  and  this  procedure  performed  well  in  Monte  Carlo 
studies. 

In  considering  asymptotic  properties  for  this  model  it  was 
necessary  to  depart  from  the  usual  method  of  proof  of  such  properties. 
The  basic  outlines  of  the  proofs  (Theorems  3.3.1  and.  4.4.1)  look 
similar  to  proofs  used  in  other  situations.  The  critical  difference 
is  that  in  the  model  considered  here,  it  is  necessary  that  the  sequence 
of  estimates  estimating  each  parameter  he  allowed  to  have  a  separate 
normalizing  sequence.  This  necessity  is  most  easily  recalled  hy  noting 
that  even  in  simple  balanced  models,  the  sums  of  squares  have  degrees 
of  freedom  which  may  he  of  different  orders  of  magnitude.  Having  a 
different  normalizing  sequence  for  each  parameter  causes  many  problems 
if  an  attempt  is  made  to  push  through  the  new  formulation  into  the  old 
proofs.  The  Wald -Wolf owitz  proof  of  consistency  breaks  down  in  several 


145 


)  146 

places.  The  Cram4r  proof,  -which  is  used  here  does  not  go  through 
directly.  However,  by  the  artifice  described  in  Section  4.3 — building 
the  normalizing  sequence  into  the  parameter--it  is  possible  to  restate 
the  Taylor  Series  expansion  type  proof  in  a  way  .that  makes  sense  and  - 
is  provable. 

The  idea  of  different  normalizing  sequences  is  the  major  extension, 
of  previous  work-in  this  paper.  It  allows  previous  results  concerning 
the  analysis  of  variance  to  be  extended  by  allowing  assumptions  to  be 
modified.  (The  assumptions  in  this  paper  have  been  shewn  to  be  restric¬ 
tive  only  in  the  sense  of  eliminating  sequences  of  designs  that  would 
not  have  been  of  interest  in  any  case.)  The  rest  of  the  work  on 
asymptotic  theory  essentially  consists  of  proving  the  conditions  of 
Theorem  4.4.1  by  brute  force.  An  important  development  contained  deep 
in  the  detail  sections  is  the  idea  of  separating  the  length  of  £  into 
components  relating  to  linear  spaces  generated  by  the  U  matrices  and 
then  "peeling  off"  only  as  much  jr  (and  hence  probability)  as  can  be 
taken  care  of  by  the  normalizing  constant  that  is  available.  This  is 
done  in  Appendix  A  where  the  partitioning  is  defined  in  Section  A. 2 
and  its  use  is  remarked  on  prior  to  Proposition  A. 3. 5. 

The  computational  procedure  called  The  Iterative  Procedure,  which 
was  proposed  in  Section  5.4,  can  be  motivated  in  three  different  ways. 
Anderson  ( 1971b)  ,  who  originally  proposed  it,  considered  it  as  an 
analogy  to  a  certain  least  squares  problem.  It  has  been  pointed  out 
here  that  the  problem  can  be  posed  as  one  of  functional  iteration. 


J.  N.  K.  Rao  (1973)  pointed  out  that  the  method  was  in  effect  the  method 
of  scoring.  Together  these  motivations  cause  one  to  have  a  good  deal 
of  faith  in  the  procedure,  but  the  final  verification  of  a  procedure's 
standing  should  be  a  proof  or  demonstration  that  the  good  properties 
mentioned  in  Section  5 .4- -guaranteed  convergence  to  a  solution  of  the 
likelihood  equations — hold.  It  has  not  been  proved  here  that  conver¬ 
gence  is  sure  but  in  Monte  Carlo  studies  The  Iterative  Procedure  con¬ 
verged  over  99$  of  the  time  and  when  it  converged  it  always  converged 
to  a  solution  of  the  likelihood  equations.  It  was  also  proved  that  in 
a  case  where  closed  form  solutions  to  the  likelihood  equations  exist, 
the  procedure  will  converge  to  them  in  one  iteration  from  any  initial 
guess.  It  can  be  said  that  even  though  proofs  of  desirable -properties 
do  not  yet  exist,  there  is  ample  evidence  to  conclude  that  The  Iterative 
Procedure  is  a  very  good  method  for  the  solution  of  the  likelihood 
equations  in  this  problem. 

This  paper  would  not  be  complete  without  mentioning  something  about 
the  rationale  for  using  maximum  likelihood  in  the  analysis  of  variance. ■  ■ 
Hartley  and  Rao  (1967)  advanced  five  reasons  which  were  paraphrased  in 
Chapter  2.  These  reasons  are  validated  even  mere  by  the  work  in  this 
paper.  The  easy  solutions  by  computer  referred  to  by  Hartley  and  Rao 
have  been  made  even  easier  by  the  addition  of  The  Iterative  Procedure, 
the  large  sample  optimality  they  cited  has  been  expanded  to  cover  many 
more  cases,  and  the  other  reasons  remain  valid.  Thus  as  a  unified 
theory  to  cover  all  cases  of  balanced  or  unbalanced  models,  maximum 
likelihood  has  much  to  recommend  it.  In  the  case  of  balanced  models, 


it  has  been  shown  for  several  in  Chapter  6  that  the  usual  analysis  of 
variance  estimates  are  asymptotically  equivalent  to  the  maximum  likeli¬ 
hood  estimates.  This  seems  to  be  a  general  rule.  Therefore,  it  seems 
that  in  these  balanced  models  the  usual  estimates  can  .be  used  since 
they  do  have  some  small  sample  optimality  properties  as  well  as  sharing 
the  large  sample  optimality  properties;  also  the  average  user  is  more 
familiar  with  these  techniques.  The  use  of  the  balanced  case  is  to 
show  how  well  maximum  likelihood  works  there  (as  in  the  Monte  Carlo 
studies  reported  in  Section  5*8)  to  give  confidence  in  its  use  in  the 
unbalanced  case  where  the  usual  estimates  are  very  difficult  to  calculate 
In  summary,  maximum  likelihood  is  a  good  method  to  use  for  the  mixed 
model  of  the  analysis  of  variance  in  any  situation  but  is  especially 
important  for  practical  use  in  the  unbalanced  layouts  where  other 
methods  are  very  difficult  to  apply. 


7.2.  Subjects  for  Further  Research 

There  are  several  aspects  of  the  topics  covered  in  this  paper 
which  may  prove  fruitful  areas  for  further  research.  One  is  an  attempt 
to  prove  the  good  properties  of  The  Iterative  Procedure.  This  may  in 


fact  be  impossible;  every  attempt  thus  far  has  been  stymied  by  the 
very  difficult  arrangement  of  the  nonlinear  likelihood  equations. 
However,  if  such  proofs  are  possible,  The  Iterative  Procedure  will 
certainly  be  the  leading  candidate  for  the  best  algorithm  to  compute 
the  maximum  likelihood  estimates.  Another  area  for  further  research 
concerns  asymptotic  theory.  Some  research  on  just  how- large  a  design 
should  be  for  the  good  asymptotic  properties  to  be  approximately  true 
would  be  very  helpful.  This  might  be  done  in  a  theorem  like  those 
used  for  the  ordinary  central  limit  theorem.  More  probably  it  will 
have  to  be  done  with  Monte  Carlo  studies. 

Another  area  -where  further  research  would  be  of  practical  impor¬ 
tance  is  the  area  of  likelihood  ratio  tests.  Anderson  (19&9)  showed 
that  likelihood  ratio  tests  are  easy  to  derive  and  that  the  criterion 
is  just  the  ratio  of  determinants  of  S  matrices  computed  under  different 
models.  Hartley  and  Rao  (1967)  also  note  that  likelihood  ratio  tests 
can  be  used  especially  easily  to  test  the  hypothesis  that  some  of  the 
cm's  are  zero,  since  the  reduced  model  is  just  another  model  of  the 

same  form.  Both  Anderson  and  Hartley  and  Rao  point  out  that  the  like- 

2 

lihood  ratio  test  criterion  should  be  asymptotically  distributed  as  x 
random  variable  under  the  null  hypothesis.  However,  Anderson's  results 
are  for  the  case  where  the  entire  experiment  is  replicated  and  Hartley 
and  Rao's  results  depend  on  their  asymptotic  theory  with  its  restricted 
assumptions.  Neither  set  of  results  can  be  extended  to  the  general 
case  using  the  asymptotic  results  of  this  paper  because  the  assumption 
here  is  that  all  o\  are  positive;  this  assumption  of  positive  cn  is 


critical  at  several  steps  of  the  proof.  Under  the  null  hypothesis 


some  of  the  o\  equal  zero  so  the  results  of  this  paper  do  not  apply, 
(in  fact,  the  asymptotic  distributions  in  certain  simple  models  are 
definitely  not  hormal. )  Further  research  ■which  will  deal  with  the 
asymptotic  behavior  when  same  of  the  are  zero  will  be  very  useful, 
both  for  its  own  sale  and  in  dealing  with  likelihood  ratio  tests. 


APPENDIX  A 


DETAILS  FROM  CHAPTER  4 

A.l.  Definition  of  a  Certain  Orthogonal  Matrix 


R*  can  be  partitioned  into  orthogonal  sub spaces  as  follows.  Let 
be  such  that  Rn=  K  ©  £(Un : , . . : )  and  is  orthogonal  to 


0 


£(Un:...:U  ).  For  s=l,2, . . .  ,c-l  let?/'  be  such  that 


£(IL:... 

:U 

~P1 

?-a 

& 

w 

• 

ft 

. :U 

-p. 

£(U.  : 

s+1 

•  •  •  •  ^ 

U  )  =  U  ©  £(U.  : 

-■"V  s  ,  _ 

1  s+1 


:U  )  and  %C  is  orthogonal  to 


■1 


U  ).  Let  %C  =  £(U.  ).  ..(The  U.  matrices  are  as 

''-p^  ~c  v~ic  v  ~i 

defined  in  Section  1.3  and  the  partition  of  {l,2,...,p^}  into  sets  Sg 
is  as  described  in  Section  4.2.)  Then  there  are  c+1  mutually  orthog¬ 
onal  vector  spaces  such  that  R  =  &L  ©  U.  ©...©?/'  .  Let  the  dimension 
*  n  0  1  c 

of  U  be  m  and  let  H  be  an  orthonormal  basis  for  .  Then 
s  s  ~s  s 


H'  H 
~S1~S2 


-  { 


-Bl*  s2 


0 
r>J 

1  s  =  s 

m 

si 


Thus  P  =  . . .  rH^]  is  an  nxn  orthogonal  matrix;  that  is, 

P7P  =  I  =  PP7.  Furthermore,  for  any  s=0,l,...,c  and  any  i  e  S*., ,UfH  =  0 

because  the  columns  of  H  are  orthogonal  to  all  vectors  in 

£(U.  :U.  U  ) .  This  yields 

~1  ,  ~1  ,  _  +1  ~T>  ° 

S+l  S+1  *1 
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V 
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P1  P1 
(  S  b.  G.V  =  2  b.U.UfH 

\.  _q  1  '>0./~S  ^_q  l'XL'^L'^S 


(Let  U_  s  I  .)  . 

v  ~0  ~n  ' 


i  ^.-1 
s+1 

=  2  b.U.UfH 

.  1~1~1~S 

1=0 


=  2  b.G.H  . 


How  since  T  =2  =2  cn.G.  is  positive  definite,  there  exists  a  lower 
~0n~0.  „  Oi~i  *  9 

1=0 

triangular  matrix  A  such  that  2  =  AA7.  But  P^2_P  is  also  positive 

rJlJ  rw  /^w* 

definite  and  hence  there  exists  ?  upper  triangular  such  that  P,2qP=T/T. 

Then  if  T  =  T  \  T  is  also  upper  triangular  and  Q  =  A7PT  is  an  nxn  matrix 
with  the  following  properties. 


1)  q'q  =  t'p'aa'pt, 

^  /v  rsJ  ****** 


=  T7P72JPT 

r*  rsj 


(T7)~1T7  TT" 


=  I; 


that  is,  Q  is  orthogonal 
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2)  Q,  =  A/[Ha:H1  : . :H  ] 
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3 

=  I-So:2o.:  *  *  * 

s 

where  Qg=  a'H£  =  A'  £  and  everything  is  partitioned  so  that  all 

t“0 

multiplications  can  be  properly  carried  out.  •  ‘ 

Then  it  is  true  that 


P1  P1  s 

(  £  b.G.V*  =  f  £  b.U.U'.Y  £  H,  T,  ) 

'i_0  \i=Q  i~i~  i/\t_0  ~t^us/ 
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because  H*  only  involves  H+  for  t  ^  s.  ^or  t  £  s}  i  e  S*  implies 

Sr  x 


i  e  S*  ;  thus  G.H  =  0.  Therefore  G.H*  =  0  for  i  e  S* 

X+JL  ~1~Z  ~  ~1~S  s+l  / 


15b 


Therefore 


3)  Q,  is  orthogonal  so  Q/Q  =  I.  This  implies 
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A. 2.  Definition  of  Condition  A. 2.1. 

Let  Q  =  [Q^:Qri:...:Q  ]  he  defined  as  in  Section  A.l,  -where  each 

~  r+IJ  ~J_  ~C 

Q  is  nxm  .  Then  if  2  =  AA7  as  usual,  z  =  A~'1'(y-Xh'(_)  ~  ??  (0,1  ) 

■whenever  y  ~  1\  (Xs'  E_).  If  w  is  defined  by  w  =  Q7z,  then  w  »■>??  (0,1  ) 

also.  But  w  can  he  written  w  =  [w7,w7, . . .  ,w7]  ,,  where 
~  ~  ~1  ~c  3 
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vT  =  Q'  z  =  Q,'  A  ^"(y-Xjr  ) .  Each  w  is  m  xl  and  w  ~  ?<  ~  (0,1  ^  )  for 
-s  ~s  ~  ~s  ~  >o  ~s  s  ~s  ~  m  Si  ' 


~s  ~  m  ~7~  in 


s=0-l,...,c.  Condition  A. 2.1  is  defined  as  follows: 


vV  . . 

CONDITION  A.2.1.  For  w„  as  defined  above,  ^  —  ,  s=0,l, . . . ,c. 


~s 


rv/ 

m 


The  following  proposition  concerning  Condition  A.2.1  is  true. 

PROPOSITION  A.2.1.  Under  Assumptions  1.3. 1-1.3. 6  and  4.2. 1-4.2. 5, 
Pf Condition  A.2.1  is  true]  1  as  n  — *  GO# 


PROOF. 
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But  w 'vr  ~  x  and  hence  Si'vr^  )  «  m  and  VarfwSr  )  =  2m  for  s=05l,. . • 

~S~S  ~  ~S~S  S  ~S~S  S  "  7  7  > 

m 

s 


so  by  a  simple  application  of  Chebychev’  s  Inequality 


P{ Condition  A. 2.1  is  not  true}  £  E  .  But  each  m  ^  for 

s=0  m  s 

s 

some  i  e  S  where^v.  is  as  defined  in  Section  4.2.  Thus  each  m  -+  »  . 
s  i  s 

as  n  -»  <»  by  Assumptions  4.2.1  and  4.2.4.  This  implies  that 

P{Condition  A. 2.1  is  net  true}  -*  0  as  a-*®,  -which  proves  the 

proposition. j | j 


A. 3.  Bounds  for  Various  Inner  Products  and  the  Characteristic  Roots 
of  Various  Matrices 

In  this  section  bounds  for  all  the  terms  -which,  must  be  dealt  with  in 
Section  A. 4  found.  These  terms  are  either  traces  or  inner  products. 

The  traces  are  of  the  form  tr  E.,G.E0G.  and  tr  E-.G.E-.G .E_G.  for  certain 

choices  of  E.,  ,E_  and  E_:  the  inner  products  are  of  the  form 
~L 


S/S'SSe*  SVSfe-SL and  7  where  E  ec*uals  ' 

E,G.EoJ  E  G.E0G  .E  or  E.G .E0G .E„G,  E,  for  various  choices  of  E_. ,  E0,  E  5 

E4J  £,  and  Bounds  will  be  established  for  many  characterstic 

roots  leading  up  to  bounding  the  desired  inner  products  and  traces. 

Let  b  >  0  be  given  and  let  0  <  6  <  ^  .  Let  e  ^(jr^)  and  let 

^2n  e  S6^*ln-)*  Let  &an=  ^a,TaO,Tal* '  *  *  jTap1^  *  ’  a=0>1>2  3114  reca11 
that  e  implies  Hjr^-  jr^JI  <  b  which  in  turn  implies  that 
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|Tli-T0ii  <  h  for  all  1=0,1,...^  and  {{3.^-3  {  <  b  fox  a11 

0=1,2 , . . .  jPq*  Similarly  lTli“Tpil  <  6  and  <  5'  Reca11 

T0i 

that  (Jq^=  -  .  Let  the  following  condition  hold. 

1 

CONDITION  A.  3.1.  For  a  fixed  h  >  0  and  with  cr„.  and  n.  defined  as  in 

-  -  Ox  -  i  - 


v->0i  "b 

Chapters  1  and  If,  — >  —  a  i=0,l, . . .  ,p  . 

i 

It  is  always  possible  that  Condition  A. 3.1  holds  because  the  are 
sequences  increasing  to  infinity  and  all  are  positive.  Several 
consequences  follow  immediately  frcan  the  above  definitions. 

PROPOSITION  A. 3.1.  For  jr as  defined  above,  if  Condition 

A. 3.1  is  true,  then  the  following  statements  are  true. 

i 

^"~i^  ^  pob  5 


and 


S  P0S  5 


0  < 


CT-.  T,  . 

Ol  <  lx 


.3°0i 

n.  2 

1 


o. .  r  7cu. 
n  Oi  2i  Oi 

0  <~r  <t~  » 


i— 0 , 1  j . . .  jP-, 


PROOF. 


Clo-  £i)'(lo-  £i>  -  <V  *i/  *  *></ 

J 


^1"  ^2'  ~2^  “  ^  ^lj'  S  P05 

0  * 
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7li  _  TOi  +  Tli"  TOi 


ni  ni 


n. 

1 


ana 


,  Tii-  TOi 
~  CT0i  n.  9 


£  CT0i +  57  <  ■ 

1 


Tn  -  -U  CTrt  • 

ll  ^  D  .  Ol  ^  „ 

-  2:  or - >  — _  >  0  . 

n  Oi  n .  2 


Similarly 


T2i  T0i  ,  Tli"  TOi  .  T2i~  Tli 
+  —————  + 


n.  n. 

1  1 


n.  n. 

1  1 


.  b  6  ,  CTOi  .  CTOi  7gOi 

Oi  n.  n.  Oi  2  IT  h 
1  1 


and 


T2i  b  6  .  CTOi  ^  _ 

-  2:  C70 . - - >  —r—  >  0  . 

n.  Oi  n.  n.  4 
1  11 


III 
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•*1  T  . 

Q  *1 

Now  define  T  =  E  — —  G.,  a=0,l,2  and  recall  T  =  IL. 

~a  n^  ~i  5  ZO  ~Q 

Proposition  A. 3.1  is  used  to  prove  the  following  proposition. 


PROPOSITION  A.3.2.  If  Condition  A. 3.1  is  true,  the  following  statements 


are  true. 


WSV  4  tr  *  4 1  ’  WSV  4  ?  >  WEiV  4  2> 


WJiV  4  4’  s  I  >  ’w®1!, )  s  6  , 


max  ~1  xg7  2  5  maxv~2  ~i' 


M,C.  JW  ‘Tsr-Iv- 

’  1=0,1,...^ 


max  |Xk[^  (Ti-So)  3 1  =  «  I  VS' 'ScfJ.l>’l 

X-—-L}1-}  •  •  •  -v _ •,  o 


min  ln.cr~. 


k=l,2,...,n 


1=0,1, . . .  ,p 


max  |X.  [T^CT.-Sh)]!  =  max  {X.  [Tr1(S_-T-  )  ]  i  <: - t — - t-, 

1~1  ~1~°  ~°~1  '  .  1  ia0p 

I- «  -a  t» 


^1,2,.  ..,n  k=l,2,...,nK  ^  i=0,^..,£f0i 


max  |x,[T9  (T.-Tp)]|  -  max  ]\,[Tp  (Tp-^  )  ]  j 


I  k~2  V^l^/Ji  “  ,  “  ,/lkL^2  4-t2^1/J1  J  min  (n.a.. 

k=!,2, . . . ,n  k=l,2, . . . ,n  i=0,l, . . . ,pj  0l 
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For  the  remaining  inequalities 


k=l. 


max  |lJc[S^1(T 
2, . . .  ,n 


max 
k=l,2, . . 


,n 


by  Lemma  B.6, 


P- 

l1  Tli~T2i 
,2n  n. 

-  max  \Z£ - i -  j 

k=l,2,...,n  P1 

.2n  aoi^£i2k 

1=0 


max 


■jL  t  —  t 
„  2i  Tli 

E  - X.G.X. 

.  n .  ~k~i~k 

1 1=0  i 


k=l,2,...,n 

.1  “oiS&iSc 


i=0 


because  cr^  >  0  and  x^G^x^  ^  0, 


k=l,2,...,n 


<  max 


1  !Tii-T2il  , 

Jo  — - 5&& 


k=l,2,...,n  ^1 

E  a_.x/G.x, 

.  „  0i~k~i~x 

i=0 


£  max 


lTli-T2i 


•  ~  ,  n.  cj_ . 

i=0,l,. . . ,p.  i  Oi 
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.  .  fn  <niV 

1=0,1,... ,p1 


by  Lemma  B.3.  Similarly, 


max 


It, ./n.-CT,. 

1  lr  i  Oi 


^kL*0  ^1  .  A  ,  c, . 

k=l,2,...,n  i=0,l,...,p,  Oi 


T  -T 

i  li  Oil 
-  max  - 

■  n  (T  1 

i=0,l,...,p1  i  Oi 


"55  (Yoi1  ' 


i-0,1, ... ,p^ 


max 

k=l,2, . . . ,n 


IXt  [T“1(T1-E,)]  j  ^  max  ’-~L1'  ~ . 
i  kL~l  X~1  ~0  .  A  ,  „  T-./n. 

i — Ojl j •  •  •  jp-^  li/  i 


It.,  ./n .  -<rn. 

’  lr  l  Oi 


<  215 
"  _  min  (Vos) 

1-0,1, ... ,p^ 


^  IVE^Sl-V1!  £.  T. . /n. 

1=0,1,...,P1  11/  i 


KTii-r2iVnil 


k=l,2, . . . ,n 
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£  max 


6/n. 


•AT  °A-/2 

1=0,1, . . .  ,P1  0i-' 


26 


^  ^0?  ‘ 


i-0,1, . . . ,pn 


,  -i,  x  . 

max  *  max  - 7— - 

k=l,2, . . . ,n  i=0,l,...,p1  21/1 


*  (n.cr  )  ‘  i  i  ! 

•AT  1  Ol' 

1-0,1, ... ,p_ 


PROPOSITION  A. 3.3.  Ret  E^,Eg,EQ  and  E^  he  any  nXn  symmetric  matrices 

and  E  =  AA7:  then  if  Condition  A. 3.1  is  true  the  following  statements 
■  1  11  1  ■  1  "  ■■■  — —  - . — ■■■■■■■  |V,r  1  ■  1 

are  true. 

x  (a7e.,g.e0aa7e0g.e,a) 


s  Bax  ixk^5c&)  I2  "T  » 

k-l,2,...,n  k-1,2, . . .  ,n  CToi 


X  o  (A7E_G.E_G  .E  AA7E„G  ,e0g.e  a) 
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J4WI2  'ttt- 

k-la2,...,n  a0.a0j. 


max  |l^(S^E^)|  max  max 

,“*1  •  •  •  jH  ^=1  j2j  « « « 1  ^2^  #  # « jii 


•  max  IX/S^)!2  •  -g  g-g-  ■ 
£-l»2,...,n  a0iCT0ja0k 


PROOF. 

The  proof  is  given  for  the  first  case  only;  the  other  cases  are 
proved  analogously. 


X  (A/E.G.E0AA./E0G.E1A) 


=  X  (A'E.AA._1G.A“tA/E0AA,E0AA'-LG.A"'CA/E,A) 


-X,  „-t. 


£  x  (a'E-AA'E-A)  X  (A"1G.A"tA"1G.A"t)  X  (a'E„AA'E0A) 
max  ~  ~1~  max  ~  ~i~  ~  ~i~  max  ~ 
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"by  two  applications  of  Lemma  B.8  , 


=  \  (rjL)2  x  (e"1!-.)2!  (e.e0)2 

max  ~0~1  max  MD  >-cl  max  ~o~2 


by  Lemma  B.9> 


M&&42  v  MSc&)|2,_r 

k=l,2,...,n  k=l,2,...,n  ct 

01 


by  Lemmae  B.  l4  and  B.7. 


Proposition  A.3.4  deals  with  inner  products. 

PROPOSITION  A. 3. 4.  Let  £  and  be  any  pQXl  vectors  and  5£,F..  and 
any  nxn  matrices;  then  the  following  statements  are  true.  --  -  - 


[C.,X/P1P0(y-X».)]‘:  £  (g/g-)  X  (X ,E~1X)  1  (A'F,AA'F_A) 


-t .  -1 


'  (x-SSo^^TT 


£o£l~2^“So^  ^  ^max^~  ~1^  ~1^  ^~^o^Eo^  ~ 


(Xr&o>  V^fe-£o 
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PROOF. 


by  Lemma  B.12, 

^  (l1/|J(5'|0)\2  (x'z~h)\  (a'f'aa'f^a) 

v^3L*l  ~2>2>2  max  ~  ~ 0  ~  max  ~  ~1<^~  ~1~' 

by  definition  of  characteristic  root.  The  other  cases  are  proved 
analogously,  j  j  j 


Now  A'f'AA/F-.A  of  Proposition  A. 3. 4  will  be  of  the  correct  form 

/n*  iv  (  rv/ 

to  plug  into  Proposition  A. 3. 3-  It  remains  to  bound  terms  of  the  form 
'F^A^A'^FgC^-^o) .  This  is  done  using  Condition  A. 2.1.  Let  Q. 
be  defined  as  in  Section  A.l  and  let  w  be  as  defined  in  Section  A.2. 

/N» 

Then 

(Z-£q)  * 


-  AQw 
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% 


w, 

~L 


W 

<'-C 


=  A  Z  Q  w  • 
s=0 


This  yields 


c  c 


=  2  S  w'Q'A,F'A"tA"1roAQ  w  . 

Q  r^rsJ  ~  ~2~^S^S 


The  Cauchy- Schwarz  inequality  gives  a  bound  for  each  term  as 


S-2~  ~  1Jj2^s^s^ 


<z  (wf0iA/F'A“tA"1FoAQ,,w,)(w,Q,A,F'A“tA":LPoAQ  w  ). 


But 


w'Q'A/F'A"tA"1F0AQ  w  £  w'w  X  (Q/A,F'A"tA“:LF0AQ  ) 


and  w'w  ^  m  hy  Condition  A. 2.1.  Thus  bounds  are  required  on 
~s~s  10  s 


These  are  provided 
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■because  q'q  =  I, 


^  sup 
x£0 


x  /A~1G .  A-t  A-1G .  A'^x 

<v  /v'^rw*  /v  rwQ_rv  /v 


x*x 

r>a 


=  x2  (e71g . ) 

max >-0  ~o/ 


2  * 


Oi 


* 


,-t. 


* 


However,  if  I  e  S  then  G.A  Q  =  G.H  =  0  as  was  shown  in  Section  A.1 

S*r±  ~S  r^> 

Then  the  matrix  in  question  is  the  zero  matrix  and  hence  has  character¬ 
istic  roots  all  zero.  |  |  j 

PROPOSITION  A. 3. 6.  If  F„=  (E„-T,  where  T,  Is  as  above  and  if 

-  -~2  'Mj  ~L  '~0  -  ~1  - 

Condition  A. 3.1  is  true  then  the  following  statement  is  true. 


X  (Q/A,P'A~tA“:LFoA0J  ) 

max  rsj 


bc 


nan 

1=0,1,..., ig+1-i 


<niV 


■2  . 


PROOF. 


-1_ 


XTnov(Q'A/F'A“&A“JF0AQ  )  =  X  (q'A-1^"1 n  )A~'GA"X(Sn-T .  )A_liQ  ) 


-t . -1/ 


,-t. 


But 


(T '-L)a\=  (  2  — -1'  ■  11  -  G.)  H 
~1  ~0  ~  ~s  \.  n.  ~i/  ~s 


* 


i=0 
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min 


i-°>1 . vr! 


by  Lemma  B.7  just  as  in  Proposition  A. 3. 2.  | j | 

PROPOSITION  A.3.7.  If  (T-.-TOe'1  where  T,  and  T^  are  as  above  and 

-  ^  'xL  'Xj  — - -  /XL  -  'X-  - - - ” 

if  Condition  A. 3.1  is  true  then  the  following  statement  is  true. 


x  (q'a'f'a'VVaq  ) 

max  ~ 


min 


PROOF. 

Proceed  exactly  as  in  the  proof  of  Proposition  A. 3.6  but  at  last 
step,  instead  of  <  "b  use  <  5.  jj| 


There  are  two  different  types  of  F^  which  will  be  encountered. 
However,  the  necessary  bounds  reduce  to  those  of  the  form  above  as  will 
be  seen.  The  first  different  F0  is  F0=  G.T"1.  But 

~2  r^c.  ~i~l 


Si1  =  S1  +  tx  -  S1 


“So  +  £ 


„-l 


vjiiere  4  =  T^-  ■£  =  =  I'X-ipgt  Therefore 

T^T"1  =  E^E"1  +  ABA  +  E"  ■^A  +  ABE-"*"  for  any  B.  In  this  case 

rsJ  I  r^Kj  'Mj 


m“l/ 


.‘I 
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\ 


The  first  term  is  exactly  as  in  Proposition  A.3.5.  The  second  term  by 
the  same  reasoning  as  Propositions  A. 3. 6  and  A. 3. 4  is  less  than 


max 


max 


k=l,2, . . . 


MSoV)l 


2 


The  third  and  fourth  terms  are  equal  and  by  one  application  of  the 
Cauchy- Schwarz  Inequality,  their  squares  are  bounded  by  the  product 
of  the  first  two  terms.  Thus  bounds  exist  for  all  terms  based  on 
Propositions  A.3.5  and  A.3.6. 
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The  second  different  term  is  F  =  (T.,  -T0)T~  .  This  yields  a  decora- 

“"t  “1 

position  as  above  with  B  =(t-T~)A  A  (T-T_).  As  above  the  sauares  of 

rv-'X.  r^'d  ^ 

the  third  and  fourth  terms  will  be  bounded  by  the  product  of  the  first 
two  terms.  The  first  term  will  be  y'q/a'e"1^-,  -T0)A~  °A-1(T.,  -TOE^AQ  y  , 
which  is  exactly  the  same  term  that  is  bounded  in  Preposition  A. 3. 7.  The 
second  tern  will  be  (Tr^)V Vt^-Tg) (^-T 

it  is  easily  seen  that  this  term  will  be  bounded  by 

x'x  max  max  ^us  £l1  terins 

k=l,2,...,n  k=l,2,...,n 

are  bounded  by  previous  propositions. 

Bounds  have  now  been  found  for  all  necessary  inner  products; 
bounds  for  traces  are  covered  by  the  next  two  propositions. 

PROPOSITION  A. 3. 8.  If  E.. ,  E_  and  E„  are  any  nxn  matrices,  then  the 

-  /"Vj_  <~'£i  —■—■■■  i  ,  ■■  -y..  .  ■  M  t  .  ■  i  ■ 

following  statements  are  true. 


jtr  E.GJ.G.U  min(m.,m.)  max  jx.  (A'E., AA/E0A)  I  •  — — 

•  ~l~n-2~j  i  v  j  ,  ,  _  1  k' - 1 - 2~  a^. 


3  k=l,2, ...  ,n 


OiaOi 


jtr  E  G. E  G. E  G  |finin(m.  ,m.  ,m  )  max  jx,  (A/E1AA/E0AA/E  A) | - 

•  1  v  i ’  j’  k'  1  k - 1~~  '-2> - '3~  'cu  .au  .au. 

k=l , 2 , ; . . , n  Or  Oj  Ok 


PROOF. 


The  proof  is  given  for  the  first  case;  the  second  is  proved 
analogously.  EnG.E0G.  has  rank  at  most  min(m.  ,m.)  and  hence  has  at 


most  min(m. 

v  1 


nonzero  characteristic  roots. 


n 

|tr  E-G.E-G.I  =  !  E  (E.G.E.G .)  I 


k=l 


n 


£  E  lx,  (E-G.E-G  .)  I 
1  k  ~l~i~-2~j'  ’ 


k=l 


£min(m.,m.)  max  lx,  (E.G.E^G .)  |  . 

1  0  -i-  -i  o  _  k  ' 


3  k=l,2,... ,n 


But 


max  |xir(E.G.E0G.)|  =  max  lx  JU'A~tA,E.1G-E0AA~:LtJ.)  I 


k=l,2, . . .  ,n 


k=l,2, . . .  ,n 


by  Lemma  B.ll  and  G.=  U.Uf, 

~J  ~0~G 


£  X  (is'A^A'hj.)  max  lx.  (a'E-.G.E-A)  I 

maxA~j~  ~  ~ir,  „  „  1  k  ~  ~l~i~2~  1 


J  k=l,2,...,n 


by  Lemma  B.8, 


by  Lemma  B.ll  again, 


s:  X  (S"1G.)  max  |x.  (ufA^A'E.JW.'E..  AA_1U. )  | 
max  ~0  ,  n  _  1  kv~i~  -v 1~~  ~i  1 


3  k=l,2, ...  ,n 


by  Lemma  B.8  again. 


=S  \  (E^c-Ox  (S^G.)  max  lx.  (AaE0AA'E.A)  1 
maxv~0  ~o/  max^~-0  1  k  ~  ~2~~  ~1~  1 

k=l,2, . . . ,n 


5  max  |X.  (A'E-M'EJOl  — - 

k=l,2, . . . ,n  Ox  Oj 


by  Proposition  A. 3. 2. 
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The  following  choices  of  E^,  E, 2  and  E^  yield  the  cited  hounds  for 
k=l,2,...,n  ^ 


T? 

h 

~3 

T-1 

~1 

if 

rp"l 

~1 

t"1 

~1 

T"1 

~1 

Si^Si-Se^1 

T"1 

~1 

Efter&’S1 

Bound 


325 

77?  W 

i-Ojl, ... jP^ 

1286 2 

/ 

min  (n  o  )‘ 
i=0,l, ... jP^1 


5125 J _ 

“in  (no-)3 


PROOF. 

The  first  case  is  obvious.  The  second  case  simplifies  to 

maaclxit[£i1(5o‘SL)]i  'wilich  is  hounded  by  Proposition  A. 3. 2.  In  cases 

three  and  four  Lemma  B.lU  applies  as  it  does  in  the  sixth  case.  The 

fifth  case  is  proved  here  for  illustrative  purposes. 

max  jl.  (A/T:1AA,Tr1(T1-T„)T:1Aj=  max  |x.  (E-Tr^ThT"1^,  -T0)T"1| 
I  ~1  —  ~1  V~1  ~£'~2  ~l  .  ,  1  kv~0~l  ~0~1  ~1  ~2  ^  1 

k=l,2,...,n  k=l,2,...,n 


£  X 


2  (ilt"1 

max  ~0~1 


)  max  |xk  [(T  -T^)^1]] 
k=l,2,...,n 


by  Lemma  B.U, 
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m  =  dim{£(U.  )}  -  dim{£(U.  : 

s  ~i  ~p  J  l  ~i  _ 

s  *1  s+1 


:U  )}.  Since 


dim{<£(U.  )}  ^  m.  for  any  i  e  S  and  dimf£(U.  )]  ^  2  m., 

S  -I  s  .  *  d 

■  3eSs 

V  % 

it  follows  that  m.-  Z  m.  ^  m  £  Z  m.  for  any  i  e  S  .  But  then 
1  *  0  s  *  o  s 

3eSs-a  3eSs 

m.  -  Z  m. 

~  1  *  j 

m  m.  jeS 

-y-  is  bounded  by  Z  -f2-  -  Z  p..  <  ®  and - - - ♦  1-Z  p..=  1 

“i  .  s*  mi  JsS,  ^  mi  .  *  "a1 

JsSs  s  JsSs+l 


because  p..=  O  for  i  e  S  ,  j  e  S  .  Thus  each  m  has  the  same  order 

S  S+JL  S 

of  magnitude  as  each  nu  for  i  e  Sg.  It  is  thus  sufficient  to  show  that 


i-Ojls . • . s is+i~- 


n.=  n.  where  j  e  S  at  least  for  n  large.  However  for 
ins 


t  <  s  and  i  e  S  and  j  e  S,  ,  p.  .=  lim  — ®  by  definition  of  the 
s  t’  rji  _  m. 

0  n-*®  .  i 

sets  S  .  Therefore,  the  minimum  m.  and  hence  the  minimum  n.  must 
s  3  1  1 

eventually  equal  n.  for  some  j  e  S  ,  But  then  the  minimum  n.  and  m 

3  s  l  s 

have  the  same  order  of  magnitude  and  the  second  and  third  expressions 

are  both  bounded. 

For  the  last  two  statements  assume  without  loss  of  generality, 
that  i  2:  j  >;  k.  Then  i  s  S  ,  j  e  S  with  s  s  t.  If  s=t,  either  m.  or 

S  t  X 

m.  could  be  the  minimum  but  then  n.  and  n.  both  have  the  same  order  of 
J  i  j 


magnitude  and  n_.n.  has  the  same  order  of  magnitude  as 
1  J 
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min[m.  ,m. ]  >  min[m.  ,m.,im  ] .  Hence  the  first  expression  is  bounded  and 

1  J  1  J  -K 

since  -»  93  the  second  converges  to  zero.  If  s  >  t  then  nu  will 

eventually  be  the  minimum  and  n_.  and  are  both  of  greater  order  of 

magnitude  than  n..  Thus  both  expressions  converge  to  zero. 

Since  all  expressions  are  bounded,  a  common  bound  for  all  of 
them  can  be  chosen.  Ill 


A.  4.  Lemmae  Used  to  Prove  Theorem  4.4.1 

This  section  contains  the  lemmae  required  to  prove  Theorem  4.4.1. 
Each  lemmae  is  stated  and  proved  in  a  separate  subsection.  These 
lemmae  are  referred  to  in  the  proof  of  Theorem  4.4.1  given  in  Section 

4.5. 


l8o 


A. 4.1.  Proof  of  Conclusion  4.4.1.i — The  Positive  Definiteness  of  J 

""  11  r  r  '  '*'  11  III  I.  I  I  ,1  I  mrn  '  —  I  I  I  I-  ■■  .  ,  I  - - 

LEIyjMA  A.  4.1.  The  nxn  matrix  J  defined  hv 


[J],, 


r  .  / 

iim  ^  ■  . 

n-*=  *  i  ¥  o  jr=jK 


)  1,  i, 3=1,2,. .. ,p -is  -positive 


definite. 


PROOF. 


CL  0 


p^Q  X,  *1 

It  was  shown  in  Section  4.3  that  J=i  ~  and  that  C_  ■was  positive 

r\  J 


0  C  - 


definite  by  Assumption  4.2.5-  It  remains  to  show  that  the  (p^+l)x(p^+l) 


matrix  C..  is  positive  definite,  where  [C-].  .=  i  lim  — —  tr  T^G.T^G., 
~1  ~1  ij  2  ~0  ~0 

ij  0=C),1,  . . .  ,p^. 

Let  b  ,b  _,..., b  be  arbitrary  constants,  not  all  zero.  It  is 

^  Jr 

required  to  show  that 


%  % 


2  2  b.  b.(C.).  .  >  0. 

i=0  3-0  1  a  -1 


P1  pl 


pl  pl 


2  E  E  b.b.(C.)..=  E  E  b  .b .  lim  tr  T^G.T^G . 
i=0  3=0  1  J  ~X  i=0  3=0  1  3  n-*°  Vj  ^  'yL~° 


,,  P1  b.  v  ,  P1  b  .  v 

=  lim  tr  T”  (  E  -G.  )T?  (  E  -*G. 

^  \=o  ni  4=o  nj  ~j/ 


n-*e° 


.181 


because  finite  sums  and  limits  and  tracing  interchange. 


nr** 


r  v 
=  llm  td  T"  (  £  —  G.  1 

- L~°  \=0  ni  ~1' 


-  i)] 


n-«°  k=l 


i=0  i 


Thus  the  object  of  attention  is  positive  for  any  finite  n  and  the  only 
problem  is  that  it  might  degenerate  in  the  limit.  The  proof  that  the 
limit  is  indeed  positive  proceeds  as  follows. 

Suppose  bQ  t  0;  then  some  of  the  characteristic  roots~6f ' ' 


*1  b^  . 

^  E  —  Oh  J  can  be  identified.  Without  loss  of  generality,  write 
i=0  i 


T  =  E  =  E  CT-.G.  for  the  rest  of  this  proof.  To  find  some  of  the 

~0  ~0  .  „  0i~i  * 

i=0 

characteristic  vectors  in  this  case  note  that  there  is  a  space  of 
2 

dimension  v_(=n^)  orthogonal  to  £(U. :...:U  ).  (See  Section  4.2.) 

(J  u  ~l  "^1 

Let  an  orthonormal  basis  for  this  space  be  H^.  Then  U^H^=  0  and  hence 
~i£o=  ~  for  i=1>2> 


This  yields 
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i  -5 

of  these  must  be  the  largest  in  the  sense  that  lim  — *  — —  :£  1 

n.  u  I 

0  |bi| 

for  i  e  S  j  +  i.  Consider  this  i  fixed  and  now  consider  vectors 
s 

belonging  to  <£(IL  ) .  Also  without  loss  of  generality  let  b^>  0 
(clearly  b_.  t  0).  Now  use  Lemma  B.5  to  show  that  a  number  of 


•1  b. 


»1 \  f  -h  •  v 

characteristic  roots  of  E^  ^  2  G . J  have  a  lower  bound  of  the  proper 


3=1  3 

order  of  magnitude;  that  is,  consider 


inf 

xtO 

/■V  /N/ 

xeL 


/lb.  . 

x'  E  — i  G.  )x 
~  V.=0  n. 


x'Enc 


■where  L  is  a  subspace  of  £(U.).  Equivalently,  consider 


inf 

y$0 

r<J  r^J 


*1  b. 

E  — ^  G. 
n 

3=°  3  J 


E  o~  .G . 

o*=o  °« 


■  .  > 

where  restrictions  are  placed  on  possible  y  vectors  to  restrict 


consideration  to  the  appropriate  subspace.  «£(lL )  is  a  space  of 
dimension  nu  ;  the  number  of  restrictions  placed  on  y  will  determine 
how  many  characteristic  roots  there  are  greater  than  the  lower  bound 
which  is  eventually  arrived  at. 


First,  note  that  for  any  j  e  S  Ll  (recall  i  e  S  )  the  matrix  U(U, 

s+1  s 

is  an  m.xm.  matrix  and  hence  has  rank  at  most  m.  (because  m.  has 
o  i  j  o 

smaller  order  of  magnitude  than  nu ) .  Hence  by  restricting  ^  to  an 
m.-  m.  dimensional  space  it  can  be  insured  that  UlU.v  =  0.  Thus  by 

restricting  ^  to  a  space  of  at  -worst  (i.e.  smallest)  dimension 

m.-  £  m.  it  can  be  insured  that  UfU.y  =  0  and  hence  that  G.U.v  =  0 

l  „  j  ~;wi~  ~ 


jeS 


s+1 


for  i  e  S  ..  .  This  restricts  consideration  to 
°  s+1 


inf 

y+0 

^  restricted 


y'uff  £  G.V-Y 


jess  J 


y'uff  £  „  a.  .G.V.y 


s+1 


p 

because  b  .=  0  for  j  I  S  .  How  note  that  y /ufG.U.y=y ,U(U.UfU.y=y ,D.y  > 

because  B.  is  a  nonsingular  diagonal  matrix  by  the  definition  of  th. 
Thus  the  expression  under  consideration  can  be  rewritten  as  follows: 


b.  b. 

—  y'ufG.U.y  +  £  -i  y'ufG.U.y 
1  j  e  S  j 


inf 

ytO 


J±L 


s 


a.  .y'ufG.U.y  +  a  v'ufU.y  +  a0.y'u:G.U.y 


/tt/t 


.t-rrtr 


y  restricted  .  „ 

^  0  f  S 


* 

s+1 
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2  i 

by  dividing  each  term  by  and  factoring  out  —  , 


1  +  Scj  n.b .  /r2 

b  JeSs  3  1  X  £iX 

=  inf  -i  - - - - - - 

vtO  ni  v/D.D"1TJ./U.U'U.D7Ti.y 

Y  restricted  2  crn.  ~  ^~i,^~p3~x~x 

J«H  °J  X^X 


b.n.  y^-D'^'U-U'U.bT-Sd.y 
/~2 

X£iX 


rs‘ylrl 


x%x 


b. 

inf  — 
£+0  ni 


b.n.  |/D71UfU.U'u.D71§ 

1  -f  2  3  1  ~  /no.  ** 

jeSo  J  x  ll 


§  restricted  2  . 

**  0.1 


5/d71tj./u.u7u.d71| 


c  /D7J'f 

/v  /*»Q_  «*£> 

Too  ITT- 


+  CToi 


s+1 

a^i 

j*o 


•where  §  =  D.y  , 
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h. 

s  inf  — - 
5+0  ni 


§  restricted 


1-2 

jeS 


lb.  In.  ^^V'U.U'U.lA 

IV^T  g'g 


.T^i 


,T\1ufu.iT,TT  T'“1* 


■g'D 


U.U.D.  § 


rfCi 

j4-i 

d-o 


0j 


/VX  **s-fl/,*»* J  ^ j 


£'£ 


+  0-_  +  O'- . 

00  Oi 


(**) 


because  X  (D.^)  ^  1  since  each  diagonal  element  of  D.  is  greater 
max  ~i  ~i  ° 

than  1.  Thus  matrices  of  the  form  D.'SjfU.U'U.D.'^  must  be  studied. 

Consider  the  trace  of  such  a  matrix. 


tr(D71UfU.U'U.D71)  =  tr[(u'U.D71) '  (u'U.D71)] 


3~i~i 


~3~a~i 


m.  m. 
i  3 


\h  A(^  ■ 


because  tr  A* A  =  2  2  a.  for  any  matrix  A.  But  now  let  the  columns,  of 


.(d) 


i  k 

.(d), 


£k 


U.  be  u>  ,...,uu';  then  since  D.=  ufu.  , 

~L  ^ .  ~i  ~i~i 

0 


,  .(J)V1) 

ru'U.D7130  =  --fry,— ,-,-y  . 

Jk  (l)  (l) 

ul  XL  ' 

MC 
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Observe  that  ->  rr  ; — rry  £  1  because  all  columns  contain  only  zeros 

Hi  Hk 

f 

and  ones  and  hence  the  numerator  counts  matches  of  ones  in  u .  1  and 

f  «  \  r  „  \ 

uf1  and  the  denominator  counts  the  ones  in  uf1'1 .  Since  there  can 

be  no  more  matches  than  there  are  ones  in  the  inequality  is  true. 

Furthermore 


m.  (j)'  (i) 


=  i 


m. 

J 


because  E  u^  is  a  nxl  vector  of  ones  by  definition  of  the  U.. 


,(i)/  Ji) 


J  (  3i£  2k  y* 

Hence  E  I  —rrr - rvv—  ]  is  a  sum  of  squares  of  items  all  of  -which  are 

^  ai1’  ^  ’ 


between  zero  and  one  and  which  add  up  to  one.  Then  the  sum  of  squares 

has  a  maximum  of  one  which  occurs  when  one  of  the  summands  equals  one 

and  the  others  are  zero.  Otherwise  the  sum  will  be  less  than  one 

(usually  much  less).  At  this  point  Assumption  4.2.4  quite  naturally  . 

applies.  It  says  that  for  j  s  S  ,  j  t  i  there  exist  constants  R.  and 

(  * 

Rg  such  that  except  for  Rnn..  of  the  u^1  ,  the  quantities 


u  3  vS±J  2 

*  v  njy . (i) 

0=1  vc  '  u,v  ‘ 


r- 


are  less  than  R, 


Thus 


188 


tr(D71TjfU.U'U.D71) 

~1 


“i  U(3)/  n(i) 

i  r  s3  (  =&,,**.,  FI 

k=l  Lf=i  '  u^  £)  '  -* 


£  (B1mjL)  1  +  (mi-  R1mi)R2 


=  mi[R1  +  (1  -  R1)Rg] 


■where  N(S  )  is  the  number  of  indices  in  S  hy  Assumption  4.2.4.  Of 
s  s 


course,  for  j  |  S  the  hound  tr(D71ufu.U^U.D.“1)  £  m.  still  holds. 


m3  ,  u(j)'  Jl}  .2 

since  £  f  —fry1 — PT  )  ^  k=l,2, . . .  ,m. . 

*ml  2k  2k 


if  A  is  any  positive  semidefinite  mxm  matrix  with  tr(A)  £  K^m 


it  is  clear  that  at  most  —  m  of  the  characteristic  roots  of  A  can  he 

K2 


— 1 

greater  than  Kg.  (if  they  were,  tr  A  >  Kg  •  —  m  =  ^m,  a  contra¬ 


diction.)  This  fact  with  Kg=  2p1(N(Sg)+l)  and  K^=  1  implies  that  the 


denominator  of  (**)  is  less  than  2p..(N(S  )+l)  £  a_ .+  o_.  if  § 

PL  s'  .i.*  Oj  00  Oi  ~ 

aelSs+l 
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l 

is  restricted  so  that  it  contains  no  part  of  the  — = —  m.  characteristic 

vectors  that  go  along  with  the  "offending"  characteristic  roots. 
Furthermore  the  numerator  of  (**)  is  equal  to 


h  n. 

since  £  1  +  "T  for  a^'  ^eS  for  11  t,ey°n°-  some 

i  j  ^  s  S 


point  as  n  “  because  lim  L  ^n~  £  1  by  choice  of  i, 

n-*=  '  i  j  ' 


But  now 
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since  the  trace  of  a  stun  is  the  stun  of  the  traces  and  since  each 


individual  trace  is  bounded  above  and  there  are  N(Ss)-l  traces.  Now 
using  the  above  argument  about  traces  and  characteristic  roots  again 


with  Kg=  1 


N(Sg)  -  1  . 

r-y  and  K^=  \  +  ^  ,  if  £  is  further  restricted  so 

’s'  '  s' 


that  it  contains  no  cart  of  the  characteristic  vectors  associated  with 


characteristic  roots  of  £  U.U()U.D.'*'  which  are  greater  than 


~i 

jebg 


1  -  ow/q""!  then  the  numerator  of  (**)  is  greater  than 


(X  +  2n71 


/  >  0‘ 


It  is  now  necessary  to  count  the  number  of  restrictions  which 
have  been  made  on  §  at  this  point.  At  most  £  m.  were  made  for 


.the  first  restrictions  (when  ?  was  y);  at  most 


K1 

£  ~  m. 

3K+1  2 


Li.  . 

v  "y+1y  (because  there  are  at 


)f^s+l 


most  p^  terms  in  the  summation)  more  restrictions  were  made  to  bound  the 


^  (N(S  )-l)m. 

denominator;  at  most  —  m.  =  - — - 1 - 

^  1  (KfS^-Dd-g^-) 


more  restrictions  were 
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made  to  bound  the  numerator.  Thus  the  total  number  of  restrictions  is 
less  than  or  equal  to  the  sum  of  these  numbers  and  the  dimension  of  the 
linear  space  oyer  ■which  the  inf  may  be  taken  will  be  greater  than  or 
equal  to  nu  minus  this  sum.  In  fact  this  dimension  is  greater  than 
or  equal  to 

m.(N(Ss)-l) 

”i  2(E(Ss,+1)  '  «SS>+Ddw))" 


r  1  g(SeM  , 

“n.1  2(ll(Ss)+l  (H(Ss)+l)(l-  Jfej)  :=3es*  “i-1 

s 


2(F(Ss)+l)(l-  gl 


-  z 


(£)] 


.  r\~+~  »  4-u. „ 

•>sSs+l  1 


i  * 

But  ^  0  as  n  -*  o  for  each  j  e  so  that  for  n  large  enough  the 

i 


• '  . >  ■ 
last  expression  in  brackets  is  greater  than  K=  s  - 

•J  (  t.t/  o  \  \  (  n 


v-  ], 

X  \  J 


3  (^(Ss  )+1)(1-^)) 


■which  is  positive. 
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Kow  combine  the  bounds  obtained  on  the  numerator  and  denominator 
of  (**)  to  obtain 


1  -  E 


b  .n. 


inf 
£+0 
5  restricted 


x 

n. 


•  c  i  b  .n  . 
jeSs  1  3 

3*i 


?,D?1ufU.U'U.DT1§ 


5,d71u.,u.u'u.d7'ls 
- f£" -  +  °00  +  a0i 


n  o„ . 
^Ss+1 

j+i 


1 

M»(s/) 

^l«V+x).  *  <V  W  ° 

^Ss+i 


Oi 


j+i 

j+0 


Kh>0. 


Now  Lemma  33.5  states  that  there  are  at  least  K^nu  characteristic- 


roots  of 


-i  /  _L  U  .  v 

E"  {  E  G.)  -which  are  greater  than  or  equal  to 
~0  \,_n  n 

<j-w  0 
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This  yields 


•1  b 


„2 


3=0  j  n. 


.2 

% 


2  ? 
-  ^  \hKb 


12  2 
"*  '  5T  bi  K3  K4  >  0  * 

r r<=  x 


Thus  for  all  choices  of  b,  b'C.b  >  0  and  hence  C,  is  positive  definite. 

/N/  ^J/W  fS/T. 

This  proves  that  J  is  positive  definite  and  concludes  the  proof  of 
Lemma  A. 4.1.  { j | 
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A. 4.2.  Verification  of  Condition  3*3.1. ii — Asymptotic  Normality  of 


9X 

Ollf 


Mon 


LEMMA  A. 4.2.  For  X  (jr,£)  >  £  as  defined  in  Section  4.5  . 


a u&i) 

ai 


Mo n 


2p<°>£> 


PROOF. 


Let  the  vector 


9X 

ai 


i=ion 


=  a 
~n 


a<°> 

~n 


(  )  >  'where  a_^  is  a 


mi 


PqXI  vector  defined  by 


a<°> 

mi 


9X 

Sg 


i=ion 


-^s's'oVxr2-). 


Px+1 


pl+1 


and  a/1)  is  a  (p1+l)xl  vector  defined  by 


[.W],  -fM 

lmi  ji  9a.  ,  , 

1  Mon 
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are  re- 
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a<°> 


n 


— —  X;A_tz  , 


Pl+1 


Am*  /v 


[apV  =  (z'A^G.A^z  -  tr^G.),  1=0,1, ...  ,p_ . 

~EL  1  2Tl .  ~  ~  ~1~  ~  -O  ~<L  1 

1 

£(0) 

Let  &  yQ_  g(l)^  3  partitioned  the  same  as  a  .  It  is  then 

rv/ 

sufficient  to  show  that  for  any  choice  of  6^,  is  asymptotically 

normal  with  mean  zero  and  variance  6  '  J6 . 


Let 


P1 


;(0)' 


W(6,n)  =  6 ,a  =  S  [z/(A"1G.A_t)z-tr  E'1G.]+^ - X'A^z 

~  ~  -mi  .  ..  dxi.  ~  ~  ~i~  ~  0  ~iJ  n  . .  ~  ~  ~ 


i=0 


V1 


=  z/F(8,n)z-tr  F(6,n)  +  f,(6,n)z  , 

/V  /v'/v'  rv  Am*  A/  ^  A/ 


.(1) 


•where 


1  6X  J 

F(6,n)  =  A'1  (  Z  —  G.V_t  and  f(6,n)  =  — —  A _1X6^ 

~  ~  ~  \.  «  2n.  ~i/~  '  n  ,  ~  ~~ 

i=0  i  p^+1 


How  calculate  the  characteristic  function  of  W(£,n).  For  each  n 
there  exists  an  orthogonal  matrix  P(£,n)  and  a  diagonal  matrix 

A(6,n)  ([£(5^)]^=  Xk)  such  that  F( 6, n)  =  P'(6 ,n)A(6,n)P(£,n) .  Of 

course,  A  contains  the  characteristic  roots  of  F  and  P  the  character- 

a^  r-»  a/ 

istic  vectors;  the  decomposition  is  possible  because  F  is  symmetric. 
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Then  the  nxl  vector  w(6,n)  s  p(6,n)z  ~  71  (C,I  )  and 
¥(6,n)  =  w'(6,n)A(6  ,n)w(6,n)-tr  A(6,n)  +  g'(6  ,n)w(5 ,n)  ,  -where 
g(6,n)  s  P(jS,n)f (5,n)  and  tr  F(6_  ,n)  =  tr  A(j5,n).  Now  the  dependence 
on  6  and  n  is  suppressed  in  the  notation,  yielding  W  =  wrAw  -  trA  +  g‘V 
-where  w  ~  71  (0,1  ).  The  characteristic  function  of  W,  $TT(t)  is  given, 

AV  /‘Vft,  *  Yi  * 


for  any  t,  by 


0w(t)  »  «0feitWJ 


„  <•  it(w'Aw  -  tr  A  +  g'w)-, 

=  o  (e  ^  ^  ^  j 

-i  tr  A  ,  <■  i(tg'v  +  tw'Aw)-* 

~  0  />-»  0  rw^  ^  nss  nss  J 


Thus  Lemma  B.2  applies,  yielding 


Mt)  -  e'1  tr  %-2it  A|-i 


Then  log  0^(t)  =  -it  tr  A-^logJ I -2 it  A j -Jt2£r  (£-2it  A)_1g 


n  n  5>  /  “  i 

=  -it  2  A.  -  I  S  log(l-2it  l)4tV(  s  (2it  A)u)g 


hj  /V,  ”  p  *-»  • 

k=l  k  k=l 


k^“6  V-V 

0=0 


•where  the  last  expansion  is  valid  so  long  as  max  ."  j\,  j  <  ~rr~  ? 

i  K.  cu 

k=l,2, . . . ,n 


which  is  true  for  n  sufficiently  large.  Note  that 
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Continuing  the  expansion  of  log  0^(t), 


n 

log  0w(t)  =  -it  E  X 
k=l 


k 


n 

~  2  2  log(l-2it  X 
k=l 


k 


)-  it 


V.Msu  A)Jg. 

J  ^ 


The  second  term  is 


n 


"2  2  log(l-2it  X,  ) 
k=l  k 


n  ra 

is  s  (2it  X,  )J*  .  i 

k=l  j=l  k  0 


K-l[21t  Xk~  2t^k  +  E  (2it  Xk)J  * 

k-i  1=3  *  0 


-while  the  last  term  is 


15*V  £  (2it  A)g 
0=0 


2 

-  i*2  2  g'(2it  A)g. 
1=1 


Combining  terms. 


+2  n  ,2, 

—  2  2X2  -  g'g 
2  k-1  k  2  &  & 


-+  2  (2it  XjJ*  i 

k=l  1=3 


k'  1 


4t 

C3 


200 


But  recall  that 


=  f'p'Pf 


/■v  /srv 


=  f'f 


=  — - —  ,X/A“tA_1X6('0'> 


1  \+± 


_  1  S^'x'X-1  X  6^°^ 

’  Vp1+1  ~  ~  ~q 


®<0>'  £n  e<0) 

'■'-O  ^ 


By  Assumption  4.2.5.  Furthermore 


n 


2  2  1 
k=l 


2 


2tr  A2  =  2tr  F2 


h  6p> 


6(d 


/  -1  ■  u-i  4.  1  -L  o:  +v 

=  2tr(  A  X  2.  -i—  G.A~ °A-1  2  -J—  G  .A_t ) 

V  i=0  2ni  1  J-o  ^.i  J  ’ 


P1  P1 


=  22  ^1}  6^}-  tr  A^G.A^G-A4 

i=0  j=0  1  J  2ninj  ~  - 3~ 


2  2  6^  6  ^  tr  El1  OX1  G. 

i=0  j=0  1  J  2ninj  ~°  ~0 
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P,  P, 

2  2  6^(0 .).  . 

i=0  j=0  1  3  -1  x3 


=  6^'  C  5^ 
*d. 


/V I  AJ 


,  Cq  0  \ 

Thus  since  J  =  ^  Q  ~  j  the  first  remaining  terms  of  log  0^(t) 


/v  /vi 


2 

to  -gt  6/J6  which  is  what  is  required  for  normality.  It  remains 


the  last  two  terms  converge  to  zero. 

■  Now 

n  <»  .  n  «> 

S,  =  |E  E  (Sit  VJ  I  I  s,s,  K'0  • 


k=l  3=3 


k=l  j-3 


As  previously  noted  maxjk  J-»  0  as  n  -»  »  so  that  2t  maxjl.  J  <  §• 


large  enough  and  therefore 


S!  *  kfx  l-2t|Xk| 


n  23t3|\,  |3 


O  n  o 

l6tJ  2  \\  j3 


k=l 


O  n 

Igf3  max  |\k|  2  |\kJ‘ 

k=l 


n 


But  2  \\  j‘ 

k=l 


i  6^'  C„  6^ 


converge 


to  show 


or  n 


<  <»  as  previously  noted  and  is  therefore 


bounded,  and  maxll,  I  -•  0  so  that  indeed  Sn  -  0. 

CO 

S  =  I  S  (2it)  Va3S| 

0=1 

CD 

=«|  2  (2it)^"f /P/A^Pf I 

'  ,  /V  /V  /w< 

CD 

=  j  2  (2it)*5f/F°*f| 

1  *  -  r+*j  o-»* 

0=1 

CD 

£  £  23t3|f'FJ£| 


oo  #  #  I 

<:  2  2Jt°f,fV?  [(f')^] 
j=1  ~~  maxLX~  '  ~ 


by  Lemma  B.12.  But  f'f  -*  6^  '  C_  6^  <  °°  and  is  therefore 


bounded 


and  F  is  symmetric  so  that 


X  [(f'^F*5  ]  =  X  (F2*5) 
maxL'~  ~  J  m ax'~  ' 


max'~ 


max  |X  (F)j2j 


k—1 


by  Lemma  B.l4,  so  that 


xL  i s  ^Mem3 


=  max  X^  ’ 


HMnnnnmmHnnHn 
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But  again  this  converges  to  zero  so  that  it  may  be  assumed 
2t  maxj\k]  <  ■§•  and  therefore 

CD 

S2  <  f'f  S  (2t  max|xk|)'3 
3=1 


=  f'f 


2t  maxj\k| 
l-2t  max  X, 


^  4t  f'f  maxjx^l 


0. 


2 

Thus  log  0^(t)  -*  ~§t  £'J6_  and  since  J  is  positive  definite  the  limit 
is  a  legitimate  characteristic  function,  continuous  at  t=0  end  6=0. 
fact  e  2  ~  ~~  is  the  characteristic  function  of  a  random  variable 

distributed  as  7?(o,6'J6).  This  Droves  that  W(6,n)  7KO,6'J6')  and 

since  £  is  arbitrary  that  a^  7^(0,  J) ,  ■which  was  to  be  proved,  j  j  | 


In 


v 


20h 
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„  ,,r  a2x(&i) 

i  L  dpBT. 


] 


i'ion 


-  T~ - l'x'i:1C.T~1(y-X  — I 


i  px+l 


V1 


from  Section  4.3, 


SX— -  I'x'^SiS^rSo). 

1  P-,+1 


using  the  same  substitutions  as  in  Section  A. 4. 2. 
Bien  c?o{0i(£)}  =  0  and 


Varo{0.(£)} 


2  2 

n.n  . 
i  Px+1 


l'£'S12iS]I0So12iS3S 


2  2 
n.n 
l  Px+1 


5/X/E"1G.S~1G.S“1X| 

^  o-<j  r*Qr*0  ^lr^O  rZ/z* 


•“  rvt  J,  -In  ,-t.-l_  .-t.*-l 


2  2 
n.n 

i  p  +1- 


f'X'A  A  G.A  A  xG.A"°A"-GCc 

rs-'  ^  ^  r*J  *•*»/ 


(Recall  that  T,  =  AA* .) 


n 


r5'x'A-VVl  i  x  ( A-  XG .  A-tA-1G .  A" t ) 

L~  ~  ~  ~  2  max  ~  ~  ' 


V1 


n. 

i 


by  definition  of  characteristic  root. 


-  -th  [i's'S's.]  -h  W&V 


Px+1 


by  definition  of  A  and  by  Lemma  B.$, 


20b 


S'5 

2  _ 
n.  n 


I"  -5—  X  (x'sT-Sc)!  X  (E~\i. )‘ 
L  2  max  ~  ~0  ~  J  max  >~0 


P-L+l 


again  by  definition  of  characteristic  root. 


*Jt-.  1 


2  2 
n.  cr_. 

i  0i 


by  Propositions  A. 3. 2  and  A. 3. 10.  The  last  expression  converges  to 

2 

zero  as  n  -»  «>  because  and  converges  to  infinity  by  Assump¬ 

tions  4.2.1  and  4.2.3. 


f 


l"0 


It  remains  to  show  that  Var_  i  —5 — 5— — 

0  1.  ot.ot .  ,  ,  ,  . 

1  J  ti on 

i, j=0,l,...,p^.  But 


r  9  |  1  fir  -1-1 

Var  |  -  v— -  \  =  Var  j  tr  En  G.E_  G, 

0‘ L  ot.ot.  I  ,  .  J  01  2n.n .  L  ~0  ~x~0 

1  ^  'ArAon  1  0 


as  n  -+  00 


using  the  same  substitutions  as  above, 


by  Lemma  B.l, 


2 


2  2  “■ ~i^0  ~0 

n.n .  ° 

i  3 


2(min[m  ,m  ]) 

£  - — J —  X  (  E'  g  .  SlHG . ) 

2  max  ~0 


2 

n.n . 

i  3 


because  there  are  at  most  min[nn,m..]  nonzero  characteristic  roots  of 


-1  -1 
ILXG.IbG., 
/~0 


by  Lemma  B.9, 


by  Lemma  B.13, 


2(min[m  ,m  ]) 

- 5-5 — 3 — X2  (A~1G.A_XA~1G  .A-t) 

2  2  >nax'~  ~i~  ~ 

n.n.  0 

1  3 


2(min[m  ,m  ])  ,  ?  , 


2  2 
n .  n . 
1  3 


X  (A"xG.A“x)X  (A"XG. 
max  ~  ~i~  '  max  ~  ~j 


P  min[m.  ,m.] 


or  03 

by  Lemma  B.9  and  Proposition  A. 3.2, 

.  0 

* 

by  Proposition  A. 3. 10. 

'  r  b2. 

Thus  in  all  cases,  Var,_  i  t-t  £ , ■  - 
J  0  1 

to  be  proved,  j  j ] 


2  2  2  2 
n.n, 

l  o 


j-  -*  0  as  n  -*  so. 


■which  was 


On 


V 
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independent  of  But  T’1-  T^1  »  T’1^-  and  g1.  A  V1  as 


usual;  then 


/  = 


T~  ll  S'fti1-  Stx^l2 


<IiJP<S05e>[  ~r~  WS^l^A'g^-^SlV 


Pl+1 


by  Proposition  A.  3.4  and  the  fact  that  T^1-  T^1  is  symmetric.  But 


UA'SX^)W  £  max  |lv[T:1(S^-Tj]|2 

max - 0  ~0  ~jl  ~1  ~  ,  p  1  kL~l  '<~0  ~ ]/  J  • 


"by  Letmaae  B.ll  and  B.l4, 


min  (n.cr0.)‘ 
i—0^1^ « • • 


hy  Proposition  A.3.2.  — ^ —  X  (X^eJSc)  is  bounded  by  A.3.10  and 

^  max  ~ 

n  .  «* 

pi+1 


thus  0  <;  - - 7 

._n  .  ^VW' 

1—0,1, ...  ,p^ 


(B  is  taken  from  Proposition  A.3.10)  and  hence 


0  -*  0  independent  of  jr^.  (Since  none  of  the  bounds  or  convergence 
rates  depend  on  jr^. ) 


For  the  next  set  of . derivatives. 


afx 

B  T-SP 
1  ~ 


-i* 

~an 


n.n 
1  p,+l 


x/t“1g.t“1(v-x  - 

+**  ~cL  ~  n 


pi+1 


i=0,l, . . .  ,p  ,  a^O,l. 


Therefore,  it  is  sufficient  to  show  for  these  derivatives  that  for  all 
§  p^Xl  such  that  §  'f  =  1  that 

/-*/  (j 


n.n 

l  Pl+1 


§/X,rT“1G.T"1(v-X  — — — ) -E~^G . D”1  (y-X  )~|j  -  0 

~  ~  L~1  'Vi'^’l  ~  ~  n  _  ~0  ~i~0  vx.  ~  n  ,  y  J 


P-j+1 


Pl+1 


independent  of  ^1r|.  Lemma  B.l6  applies  here  and  thus  it  is  sufficient 
to  show  that  each  of  the  following  two  terms  goes  to  zero  independent 


of  im* 


0,=  — - -  Ii'x'^G.T:1-  2"1G.2"1)(y-»y-)|, 

1  n.n  ,  n  ~  V~1  ~i~l  -~o  ~i'~0  ~-0  1 

i  Pl+1 


(Recall  that 


lo 


n 


P-j+l 


“2o-> 


i  =  — - —  Ie'x't'L.t'L  ^~°  ~lJ 

* 2  n.n  , ~  ~1  ~i~i  ~  n  .  .. 
l  px+l  px+l 


Now 


02  £  JL'e' 
v2  2 
n. 

l 


'CMPbr- 

n 


Pi+1 
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by  Proposition  A.  3.4, 


by  Propositions  A.3.1,  A.3.2,  A.3.3,  A.3.10, 


*  ~  p  b^  1 6 

n.  a  . 

x  Oi 

1=0,2., . , .  ,p^, 

independent  of  . 

Now  a  technique  is  developed  which  is  used  often  in  this  and 
subsequent  sections.  T"1^  E”1  +  T”1  -  £~1=  E”1  +  A,  where  A  =  T-1-  E_1 

'XL  'MJ  'XL  'Xj  '“w  r^J]_  * 

As  noted  previously,  A  is  symmetric  and  A  =  T,  ^"(E_-T,  )EfS  thus 

T.^G.T, "'’-2//'"G.2 ^  =  E_"*“G.A  +  AG. 2  _  +  AG. A.  then  breaks  down  into 
~1  ~i~l  ~0  <~0  ~i~  /wl~0  ^1 

three  terms,  each  of  which  fits  into  the  following  formula  from  Proposi¬ 
tion  A. 3.4  with  appropriate  choice  of  F.  and  F„. 

/vx  xc 

n. 
i 


'zl-o1-  ■  x  (x'e"1x)\  (a'f'aa'f  a) 

4  2  max  ~  ~0  ~  J  max' - ~i~' 


Px+1 


f— i- 

lVp1+i 
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£  ~  B  \  (A/FnAA/F,A)(y-X^^)/F'A"tA“1F Jy-YaJ 

2  max  ~  ML~~  ~1~'  ^>L  ~~o' 


by  Proposition  A. 3- 10.  Now  fit  in  the  —  where  appropriate  with  F  and 


F_  to  obtain  the  division  shown  in  Table  A. 4. 4.1. 


Table  A. 4. 4.1 

Division  of  1,  into  Terms  with  Appropriate  F,  and  F_ 
l  ML  M; 


Term 


~~  et^.t:1 

n±  M3  ~iML 


(So-JFS1 


A.  r  y-1 
ni  ~i5o 


—  t"1(e_-t1  )e“1g,t“1 

n.  ML  M3  ML  Mj  ~iML 

l 


^So"~i^So 


Proposition  A. 3. 3  now  yields  the  following  bounds  for  \  (a'f'AA'f'a) 

max  ~ 

which  axe  given  in  Table  A.  4. 4.  2. 


Table  A.  4. 4. 2 


Bounds  for  X  (A/F1AA/FnA)  with  F.  from  Table  A.4.4.1 
max'~  ~1- — -  ~1~  ~1  * 


Term 


Bound  for  X  (a'f'AA'F.,A) 
max  ~  ~l~~  ~l~ 


2  2 
n.ou. 
l  Oi 


4b‘ 


min  (n.an.) 


j~ Oj 1, ... sP^ 


•  a  V  /"N  a  . 

3  01 


l6b‘ 


2  2  .  r  v 

n.CT  unn  (  n  ,CT„  . ) 

1  J 


All  these  bounds  go  to  zero  independent  of  jr^;  thus  it  is  sufficient 

to  show  that  (z-SIq)  is  bounded  independent  of  jr^ 

for  the  two  possible  choices  of  Fg.  The  remarks  preceeding  Proposition 

A.  3. 5  show  that  it  is  sufficient  to  show  that  v/8/A/F'A  UA  '4?_AQ  w  is 

~  ~2~-Ss~s 

bounded  for  s=0,l,...3c.  But  propositions  A.3.5  sad  A., 3.6  apply  here 
and  together  with  Conditions  A. 2.1  and  A. 3.1  they  yield  the  following 


bounds 
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It  is  sufficient  for  these  derivatives  to  show  that 


|  tr  T^G.T^G,  -  tr  E‘XG.E"XG  .  | 
0  dXl  .  H  .  ’  XL  ''-Q.XL  rs^J  X3  rsSlr^jQ  • 

a  1 


-1„  ^"1. 


and 


0,  =  — 1  (y-X  -•-~1  ••)  'l^G  •  .T^fy-X  ~1 
*x  n.n. 1  o-*  n  n  j  XL  ^<lXL  xl \X  *■*-» 

pi 


i  J  pn+l  P-j+1 


-(z-X^~ -)'  S1giS123S1(z-£iX)l 


P-j+1 


Pl+1 


each  converge  to  zero  independent  of  jr^  as  long  as  Conditions  A.  2.1 
and  A.3.1  are  true. 

For  the  first  term  write  T”1^  XL1-*-  A.  Since  tr  C  +  tr  D  =  tr(C+D) 

r^X  ^~\)  ^  /v»  rss' 

for  any  matrices  C  and  D,  write  <f>n  as  three  terms  and  hound  each 

separately.  Each  is  of  the  form  — |tr  E..G.E0G.|  and  hence 

2n.n.’  ~l~i~2~-j 1 

1  J 

Propositions  A. 3. 8  and  A. 3. 9  apply  to  give  the  following  hounds. 
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Table  A. 4. 4.4 

Bounds  Used  to  Demonstrate  Convergence  to  Zero  of  0. 


Term 


Bound  for  -z -  tr  E..G.E0G. 

2n.n.‘  ~l~4'-2~o 1 

i  J 


T-i(s.T)E-i  b  _ 

~i  -1  ~°  ninj  „_„fn  <W,aoiCToj 

0,1} ••• 


2  ^(So-SiJg1 


mln(m.  ,m_.) 


“in  KCT0k)cr0iCT0j 

k=0,l,...}P1 


,  .  .  min(m.  ,m.)  2 

3  Si  fSo-Si’S  <5o-£i>S  — frH - : — ? - 

1  j  „  “  KV  .°oiao3 


Again  Proposition  A. 3. 10  guarantees  that  all  these  bounds  converge  to 
zero  independent  of  jr^  and  hence  0  -»  0  independent  of  jr^. 

Handling  0^  is  fairly  messy.  Lemma  B.l6  applies  giving  four  terms: 

*ii  =  a^rlfe-SoJ'CS^SiSi^Ei1- 


*12 =  tttt  I  —  x'i:1G.T:1G.T:1(Jr-}sI)|> 

12  n .  n  .  1  n  , .  ~  ~l  ~o.~l  ~n~l  ~  ~~  1 

n  ~r\  4-1  u 


i  o  px+i 
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^13'5^71(2-Slo)'Sgi£i^ 


(  q  _q 

4,  m-l/i  m-l^r  ^0  ~1 


1  .1 


’j~l  ~  n 


P1+1 


0l4  2n.n . 

i  0 


(,§0-£l)  ,  _i  _i  I 

—  -  —  X'T,  G,T,  G,T,  a  — — ~  j 


„  «•*  -i  v«  •  -i-  -,  v-»  ,  a.  _  ji. 

n  , ,  ~  ~1  ~i~l  ~j~l  ~  n 


'P-j+1 


'P-L+l 


0^  is  easy  to  dispose  of. 


4 s  rib  C<ao-ftL>'«Io-&.>]  br~  WS'S^)]2 

i  j  np,+l 


*  XTnaY(A,T"1G.T"1AA.,T"1G.T:1G  .T^A)- 
max  ^  'vX  /v'j/v<L  /v// 


by  Broposition  A. 3.4, 


<  1  AUi  1 

„22P0bB2  22 
2n.n .  cr^.o^. 

10  Oi  Oo 


by  Propositions  A.3.3,  A.3.2,  A.3.1,  and  A.3.10.  This  certainly 
converges  to  zero  independent  of  Jr^  . 

Now  use  Proposition  A.  3. 4  again  with  F^=  T^Chf”1  (which  is 

symmetric)  and  F0=  G  .T"1  to  obtain 

~2  og~2  .  • 


^  1 


12  ~  2  2 
n.n . 
i  J 


( 3^-3-,  )  ^(0^.-3-. )[~  -J1—  X  (X'Z:1*)  |  X  (A'T^G.TXa)5 
~-0  ~L  ~0  4  L  2  maxN~  ~0  max  ~  — -1  ~i~l  ~ 

n 


pi+i 


•  (x-Sq)  'sibb V 
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4  p=a  •  (^)2(Z-so)'[  %  ]  *V  r  w 


Since  the  first  part  clearly  goes  to  •  z§ro  it  is  sufficient  to  bound 

7f/.4  “A  A  (y-kcrj  for  F  =  — -  G.T.\  But  since  Conditions  A.2.1 
*->  ~  >o  ~*-q'  n .  ~j~L 

U 

and  A.3.1  true,  the  remarks  following  Proposition  A.3.7  show  -that 
all  the  necessary  bounds  were  obtained  while  working  on  the  terms  for 


-A 

xg  ;  these  bounds  are  contained  in  Table  A. 4.4. 3,  This  all 

j°~ 

guarantees  that  0ip  -»  0  as  n  -»  »  independent  of  jr^  n  as  long  as 

Conditions  A.2.1  and  A. 3.1  are  true.  0^  obviously  follows  the  same 

bounds  as  0^  and  hence  also  converges  to  zero. 

-1  -1 

To  handle  0.n  use  the  fact  that  T..  =  E_  +  A  and  hence 

J-J-  <VJ_ 

-1  -I  -I  _*]  -I  -I 

T.,  G.T,  G.T1  -  E L  Gi.  G  .E  breaks  into  seven  terms  of  which  two 

~1  -~1~1  ~0  ~1  r-Q  ~J~0 

typical  ones  are  AG.E-^G.E.^  and  AG.AG.Ei^.  Each  of  the  seven  terns 
that  0^  breaks  into  are  then  of  the  form  i (JfSLq)  'Zc&i for 
the  following  values  of  F„,  F,  ,  F„  (again  the  n.  and  n.  are  inserted 

~0’  ~1’  ~2  l  j 

in  the  appropriate  places) . 
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Table  A.  4- 4. 5 

Division  of  0^  into  Seven  Terms  with  Appropriate  Choices 

of  Zo>h>  804  E2 


Term 


F 

£-0 


i0 


—  s^c-. 

n.  '-0  ~i 

l 


J 


1  -1 „ 
—  y  G. 
n.  £o  ~i 


1  -1  -1 
—  2L  G  .T 
n  .  r>£)  /Nrfjrwl 
3 


(So-EFS1 


3 


—  t^g.s:  1 
ni  ~1  ~xMD 


—  G.2"i: 
n  ~o~0 
J 


E'1(s_-T)  — - —  T~1G.rT1G  .t”1 

~r)  N-rw0  n .  n . 

u  1  3 


(So-h^1 


ZZ1^- T-.)  ~  T"1G.Tr1(S/,-T1)S"1  -=-G.S'3 

,-^0  v<~-0  ~1/  n.  ~1  ~i~l  ~0  ~l/<~0  n..  ~j~0 

X 


6 


l 

—  ZL  G. 
n.  ~0  ~i 
i 


—  t:1(sa-t1)z“1g.t:1 

n.  ~1  Si^0  ~1  tvo  ~n~l 


s"1  (ZL-T..  )  -i-  t:XG . T^IL-T-.  )E“XC-  .Trx  (Sa-T1  )ZT 

-0  ~0  rsyQ_ 7  n « n .  ^1-^1  /^-0  'vL y  **■0 

i  a 


i  m  i  f 


„-l 


7 
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By  inspection  of  Table  A. 4. 4. 5,  it  is  easily  seen  that  all  F_  and  Fn 

are  of  such  a  form  as  to  fit  into  Table  A. 4. 4. 3.  Furthermore  all  F.. 

are  such  that  X  A)  -*  0.  (This  is  done  by  application  of 

max  ^ 

Lesima  B.13  and  Proposition  A. 3. 2  and. then  using  Table  A.4.4.2.)  These 
two  facts  together  yield  that  each  of  the  seven  terms  converges  to 
zero  said  hence  that  0^  -»  0  as  n  -*  ®  independent'  of  jr^. 


This  now  covers  all  possible  cases  of 
is  proved,  j  j ] 


a2x 


and  thus  the  lemma 


A.. 4. 5.  Verification  of  Conditions  3»3.1.v  and  3-3. l.i — Uniform 

for  e  S^(^Qn)  in  Probability 


*2 

■9.^tinuitXof 


An 


LEMMA  A. 4. 5-  For  jr_  and  X (jr,  jO  as  defined  in  Section  4.5  and  for  any 
b  >  0,  if  Conditions  A. 2.1  and  A. 3.1  are  true,  then 

,  . 

rr vr —  is  a  uniformly  continuous  function  of  in  (jt^ ) , 

*  j  n 

i 5  j=l,2 } . . • jP • 

PROOF. 

Let  T)  >  0  be  given  and  let  e  ^(^Qn)  »  It  Must  "be  shown  that 
there  exists  6  >  0  (without  loss  of  generality  6  <  such  that  for 

fcn  e  S6(iln> 
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M2  <  -L 
03  *  2 

n. 

1 


a'£Xli-£e>  '<&-&>[  W&'S^SiSeV 


V1 


by  Proposition  A, 3. 4, 


<-  a2  1  -oS  64 

si  —  •  p0- b  *  — 

ni  °0i 


by  Propositions  A.3.1,  A. 3. 2,  A. 3.10.  This  is  again  of  the  desired 
form. 

For  as  above  =  T_^  +  Ag  and  thus  there  are  three  terms  in 


the  difference  of  the  form  §'x'F-X 

/v  /v<  1 


which  can  be  bounded  by 


Propositions  A.3.4,  A.3.2,  A.3.3,  A, 3*1  and  A.3.10  as  follows  in 
Table  A.4.5-1- 


WB&BBEM 


M 
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Table  A. 4. 5.1 

Division  of  0^  into  Three  Terms  with  Appropriate 


and  Bounds  for  Squares  of  Each. Term 


Derm 


2 


-1 

1 


Eventual  Bound  for  its  Square 


256pQB2 


22.,  N2 
n.cn..  min  (n.cm.) 
i  Oi  o  Of 

j=0 jlj ... 


256p0B2 

4%i  _  13111  (nfQl)2 

j-0,1, ... 


3 


-l 

l 


to96p0B2 

2  2  7  75 

n.a_.  min  (n.a0.) 
i  Oi  j  oy 

j=Ojlj • . • ,p1 


All  of  these  bounds  have  the  desired  form. 

It  remains  to  dispose  of  0^.  Again  there  will  be  three  terms  of 

the  form  X  F_F_(y-X  a.)  which  can  be  bounded  by  Proposition  A. 3.4. 


V 
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Table  A.4.5. 2 

Division  of  0^  into  Three  Terms  with  Appropriate 

Fn  and  F„ 


Term 


F 


—  g.t"1 

n.  ~i~l 
l 


JL  T-ifi  m-1 

n.  ~1  £d~2 
i 


(Ei-Ie)Jl1 


\  (Si-Se'Ei1 


Again  it  is  apparent  that  all  these  terms  will  yield  proper  bounds 

2 

even  after  the  further  division  into  c  terms  using  the  Q  matrices 

/>5S 

and  w  vectors.  For  illustration  consider  Term  2  above. 


x  (a'f'm'f.a)  5  -| 

max  ~  ~1~'.  ~1~  2  2 


64 


n.  . 
l  Oi 


by  Propositions  A. 3.3  and  A. 3. 2.  Further,  as  in  the  comments  following 
Proposition  A.3.7,  F2=  (T^)!'1  -  (&-£,) (g1*  A).  (A  -  Tp1-  • 

Thus  w'  Q'  A'  F^  A  13  A  1  F0  A  Q,  w  breaks  into  four  terms  and  only 

^  ^  eKJdi  /v/ 


two  need  be  bounded.  The  first  term  is 

w'  Q'  Af  S”1(T1 -T  )A_t  A_1(T1-T0)  E”1  A  Q  w  ;  this  is  covered  in 
~s  ~s  ~  ~-0  ~2  ~  ***  ^1  ~2  ~0  ~  ~  ~s 
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Proposition  A. 3. 7  and  has  a  bound  involving  6  and  the  proper 
denominator.  The  second  term  is 


w'  Q'  A"  g:1(E_-T1)T“1(T1-T0)A_t  A-1(T..  -T0)t:1(Ea-T1  )S"1A  Q  w 


<  w'  Q7  A7  E"1(E/,-Tn)A"t  A_1(E  -TjE"1  A  Q,  w 
~s  -^s  ~  'O  '~0  ~1  ~  ~  ~1  <~0  ~  ~s  ~e 


•  X  (A/T:1(Tn-T0)A~t  A"1^-^)^1  A) 
max  ~  ~1  ~1  ~2  ~  ~  ~1  —P  ~1  ~ 


by  definition  of  characteristic  root.  But  the  first  term  is  that  of 
Proposition  A. 3.6  and  is  properly  bounded  and  the  second  is  less  than 

4  2 

or  equal  to  — r—j - r  •  6  .  All  the  other  cases  also  give  proner 

mn(n.a0  )  ~ 

<3 

j=0,l, ... ,p^ 

bounds.  (All  terms  are  bounded  by  a  correct  function  of  6  and  a 
suitable  constant.)  This  settles  the  case  for  0^  and  hence  for  these 


derivatives  of  the  form 


Bt.S3 
i  ~ 


Now  consider  the  derivatives  of  the  form 


A 

_  1  ft 

'  _1  _1  ( 
4--«rn  “4“ri  rn  -*-r«  of 

@ 

r  "u"  /n'“^  f- 

„.x  ^  yi 

3TiSTo 

.  '  2n.n  L 

i=nn  1  => 

0J71  u-.X  Lr.-c-l; 

^  ~i~a  v 

/ -  A  1  1  Lr .  -t  Cr  .x  i  , 

~  n  ->  /  ~a  ~i~a  ~n~a  v 

P-,+1 

1  ~  v+i^’ 

i > D~ Ojlj . . .  jP-i  ?  2—1,2. 


It  is  sufficient  for  these  derivatives  to  show  that  for  all 
i, j=0,l,. . . ,p^  that 
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and 


*o  -  2ZV  ltr  -  tr 


1  D 


*1*  n 


~  |(y-X  ---■--)  Tl1G.T"1G.T"1(jr-X  — — ) 
.n.  V*-'  xi  ,  /  ~2  ~l~2  ~]~2  ^  n  .  ■  J 

i  j  Pl+i  °  - 


Pi+1 


"  (X"X  TT^~)  T“1G.T"1G.T"1(y-X  ~~1— )| 


v1 


P-L+r 


are  both  bounded  by  suitable  functions  of  6. 

For  0_  there  axe  three  terms  of  the  form  — ^ — Itr  EnG.E_G.l  and 
o  2n.n . 1  1 

1  J 

hence  Propositions  A. 3. 8,  A. 3.9  and  A. 3.10  apply  to  give  the  following 
bounds  in  Table  A.4.5.3- 


Table  A.k.5.3 

Division  of  0Q  into  Three  Terms  with  Appropriate 

E_  and  E0  and  Bounds  for  Each  Term 
~1  ~2 


Term 


E, 


,-l 


t"1 


Bound  for  7— —  Itr  E  G . E  G  . I 
2n.n .  1  ~l~i~2~j‘ 


1  1 


SB 


,-l 


3 


mln<Vok)aoic'o3 

k=0,lj ...  ?p-^ 

T^CT  -T0)T~^  -7-7 - — — r - 

^=0jlj . • . >px 

32B _ 

min(nha0k)  CJoiCTOj 


h— 0 jlj  .  ••  jp-. 


All  these  bounds  are  of  the  proper  form. 
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Now  0  breads  into  nine  parts  by  Lemma  B.l6  as  follows: 


*11  =  msrl  fe-£o)  fe-S®)  I 


(3  -3  ) * 

— ^— |—  — . -  x/(t“1g.t"1g .tZ1-t"1c-.t“1g .a?'1) (v-x*J  | 

n .  n  n  .  ~  ~2  <~i~2  '~o~2  <~i  <xl~1  ~j,~1  ~  <^<~0 

i  a  p-l+i 


/  Q  _Q  \ 

~— |  (y -2&J  ,(Tr1G.T"1G.Tl1-T-1G.T“1G.T“1)X  —  ■—  | 

n.n.  *■>  ~~0  '<~2  ~i~2  ~n~2  ~1  ~a~l  ~i~l  n  . 

i  J  p.j^+1 


_I_j  x/(tI1g.tZ1g.t~1-t~1g.t~1g.t~1)x^—  -  I 

n.n.*  n  -  ~  ~2  ~1  ~i~l  ~  n  1 

-Pi  a~ 


~— |  (y-3£j  't^g.t^g  .t'-Sc  ■■— ■  ~2^  j 

mn, 1  — o'  zq  ~i~2  ~g~2  n  ,n  1 


V1 


n.n. ‘ . 
i  0 

- = —  X 

V1  ~ 

1  | 
n.n. 1 

i  a 

<£o-£l> ' . 

n-D  +1 
‘1 

1  | 

n.n.1 
i  i 

Vi  : 

1  1 

L  U-.J-o  U-  .  -L-,  -A.  - 

~2  ~i~2  ~j~2  ~  n 


P-,+1 


ill  '•P/ 

x't“  g.t~  g.t“t:  — 

~  *~J2.  ~  n 


p-jj+i 


P-L+l 
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All  of  these  terms  can  be  properly  bounded  using  Proposition  A. 3. 4. 
The  first  four  must  be  divided  into  seven  terms  in  the  usual  way 
using  T"^  T.,1  +  .  The  last  three  are  easily  bounded  as  follows 

using  Propositions  A.3.4,  A. 3.2,  and  A.3.I0. 


?b2 


^17  s 


•o 


2  2 
n.n. 
i  3 


B 


4096 
2  2 
aOiCTOo 


n.n . 
i  J 


B 


4096 
2  2 
CT0iCT03 


m2  r:  0 
^19  2  2 
*  n.n. 
i  J 


B 


4096 
2  2 
aoiCTo0- 


All  these  bounds  are  of  the  proper  form. 

0^„  and  are  equal  except  for  changing  i  and  j  so  the  same 

2 

bound  applies  to  each.  Since  a  6  term  will  come  from  the 

part  it  must  be  verified  only  that  boundedness  for  the  remainder  can 

be  derived  by  Proposition  A. 3. 4  with  F]_=  -  T^Vt^1  and  Fg=  ^  Gh1  . 

i  3 

But  the  same  argument  used  above  for  Fg=  —  can  also  be  applied 

to  this  F^  yielding  a  proper  bound.  Thus  0^  and  0^g  can  be  properly 
bounded. 


Now  the  decomposition  of  the  other  terms  is  summarized;  the  source 
of  the  ail  important  6  terms  is  noted.  It  can  be  verified  by  inspection 
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that  all  bounds  will  be  of  proper  form.  0  ^  breaks  into  seven  terms 


of  the  form 


(jL-gv  (js -e  ) 

‘*“0  ~1  v/-n  V  ~1 


— - X'F.,X  — - — —  with  F  =  E  G.E.C-  ,E.  Such  terms 

p^+1  Pp+1  0 


are  bounded  by  Propositions  A.3.4,  A. 3. 3,  A.3.2,  and  A. 3. 10. 


Table  A.  4. 5. 4 

Division  of  0^  into  Seven  Terms  with  Appropriate 
En ,  E0  and  E_  and  Source  of  6  Term 


Term  E^ 


Source  of  6  Term 


T"1(T1 -T„)tZ2 


2  Tn 


3  T, 


4  t: 


5  Ei^Ei-Se^1 


T:1^-^)!-1  EnJE, 

<v|  'rvj_  i  J 


6 


-1  V~1  ~2 1~2 


7  t"1^..  -t0)tZ1  Tr-^-TjT"1  t"1^., -t  J?‘] 

^v]  ^v|  »v2  /v-2  rv>1  «vP  ^vl  /v1_  >s^  <vP 


il’t£ 


T?  V  -p1 
~15~2’~3 


012  and  0^  are  the  transpose  of  each  other  upon  interchange  of 

(Pq-P-l)  ' 

i  and  J.  0no  breaks  into  seven  terms  of  the  form - X'F.F-fy-Xb'-) 

lii  n  ,  ~  ~i.~2  ~  <~~o 

Px+1 


which  can  be  bounded  by  Propositions  A.3.1-A.3.7  and  A. 3-10. 
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Table  A.4.5.6 

Division  of  0  into  Seven  Terns  with  Appropriate 
Fn,  Fn  and  F_  and  Source  of  6  Term 


Source  of 


Term 


F 

~0 


F, 


,-l 


1  T.  (T.-Tj  -  tI^G.T; 

'vL  ~1  ~2  n.^2  ~i/vl 


2  -  T~XG. 

n.'vl 
i 


3  -  T^G. 

n.~l  ~i 

l 


4  -  T^G. 

n.~l  ~i 

l 


-  t^g.tT1 

n  .~1 
0 


-  t:1(t1-t0)t:1g.t:1 

n  ,~1  ~1  '■'■2  <-~2 

D 


5  t^Ct-tJ  -4-  tZWW1 

i  3 


6  T^CT^-T.)  -  t:1g.t"1(t1-t0)t"1 

~1  ~1  ~2  n .  ~2  '~<l~1  ~1  ~2  ~2 

l 


i  0 


F2 

6  Term 

-  c-  .t"1 

n 

«] 

F 

-  g.t"1 

n  .~j~l 
cl 

F, 

~L 

£2 

(Ei-Se)*!1 

£i»Ee 

Eo’^2 

-  G  .T"1 
n  .~}~1 

D 

X0?F1 

(T.-T  JT'1 

~L  ~2  ~1 

lo>Zl>l2 

As  noted  before  every  term  above  yields  a  bound  of  the  proper 
form  after  all  decompositions  (including  the  decomposition)  have 

been  made.  Each  term  also  includes  a  6  term.  Thus  all  terms  have 
proper  bounds  and  the  lemma  is  true.  Ill 


APPENDIX  B 


ALGEBRAIC  LEMMAE  USED  HI  PREVIOUS  CHAPTERS 


The  following  lemmae  are  used  in  previous  chapters.  Ho  claim  of 
originality  is  made  for  any  of  them.  They  are  collected  here  so  that 
the  reader  may  easily  refer  to  them.  Only  a  few  proofs  are  given. 

The  other  lemmae  can  he  proved  by  algebraic  manipulation  or  applica¬ 
tion  of  simple  well  known  results.  References  are  given  where 
appropriate. 

LEMMA  B.l.  If  ^  ~  T^jXjE)  £  positive  definite  and  B  is  an  nxn. 

symmetric  constant  matrix  and  b  an  nxl  constant  vector  then 

'£,&-£)  =  tr  BS  +  (jx-b)  'Bfjx-b) 

and 

=  2tr(B2)2  +  4(fc-b)  toCji-b) 

PROOF. 

This  lemma  is  easily  proved  by  algebraic  manipulation  of  the 
moments  up  to  fourth  order  of  the  multivariate  normal  distribution. 
These  moments  can  be  found  in  Anderson  (1958:39)*  jjj 


LEMMA  B.2.  If  z  ~  Tt  (0,l)  and  b  is  an  nXl  constant  vector  and  A  is  an 

1  "  /v/  *^11  ^  1  Ml  1  i  i  i  i  — — — — 

nxn.  diagonal  constant  matrix,  then 


J  i(b'z+z,Az)'|  |  -|b'(l-2iA)"'Lb 

(3  ]  S  ~  ~  f  —  j  I  —  2iA|  ^  G  ~ 
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PROOF. 

The  proof  of  this  lemma  is  merely  an  extension  of  proofs  of  the 
characteristic  function  of  a  multivariate  normal  random  variable.  It 
is  done  by  completing  the  square.  A  reference  is  Plackett  (i960: 16-17) . 


LEMMA 


B.3.  If  x^,Xg, . . .  ,x  are  all  nonnegative  and  at  least  one  x^  is 


positive  and  b^3b„ , . . .  ,b  are  all  positive  and  a-^a^,...^  have  a n^ 


sign. 


P 

2  x.  a. 

a.  11  a. 

min  ~  £  — - - —  £  max  ~  . 

i=lj2,...,p  i  2  x  b  i=l?2, . . . -p  i 

.  -1  i  i 
1=1 


PROOF. 


x.b.  p 

Let  p.  =  — x-  -1 —  .  (  2  x.b.  is  positive  because  at  least  one  x. 

1  p  .  3  0  0 

2  x  .b  .  3 

0=1  3  3 


is  positive  and  all  the  b.  are.)  Then  0  £  p.  <1  and  2  p.  =  1. 

3  1  i=l  1 

a  p 

Let  c .  =  : —  .  Then  2  p.c.  is  a  -weighted  arithmetic  mean  of  the  c.  and 

t  n  1  ”1  1 

1  Di  i=1 


tow  •  W-W  . 

.is  thus  between  min  e.=  min  ~  and  max  c.=  max  —  .  ?ut 

*  n  r\  *1 


P 

2 

i=l 


x.b. 
1  1 


x.b  . 


0=1 


0  0 


p 

2  x.a. 
i=l  1  1 
P 

2  x.b. 
i=l  1  1 


and  the  lemma  is  proved.  { j  | 
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to  jPoj*  •  •  >3-  •  Let  these  vectors  he  c*,  .o'  , . . .  ,a  . .  Then  x  e  £  if 
<~i  rJc.  *vi  <"1  '"-2  ~n-i  ~ 


and  only  if  x7a.=  0  for  j=l,2, . . .  ,n-i.  Lemma  B.4  then  applies,  proving 

/--Q 


this  lemma.  m 


x.'Bx. 


LEMMA  B.6.  For  A  and  B  as  in  Lemma  B.4,  X.(A  b)  =  ^fr1'  where  x.  is 

-  - - ’  i~  x.Ax.  -  ^-q.  — 

~a~~a 


the  characteristic  vector  associated  with  X. 


LEAMMA  B.7.  Suppose  B,  =  E  b, .  G.,  B„  =  E  b^.  G. ,  all  GJ  are  positive 

— ^ - ~1  .  _  li  ~i  ~0  .  Oi  ~i  -  l  - £ - 

i=0  1=0 

semidefinite  and  at  least  one  is  positive  definite  and  b  >  0, 


i=0,l, ... ,p.  Then 


15  V&V  - 


E  b_ .x'G.x. 
i=0 
P 

E  b  x'G.x. 
i=0 


■where  x.  is  the  characteristic  vector  associated  with  X., 

- -  y 


u>  by 

i-0,l>  •  •  •  $3?  0i 


iii)  If  H  is  an  nXm  matrix  such  that  x7H  =  0  implies 

-  1  ■  ■■  ■■  1  1  — 1  ~~  /-*-»  <-*«-*  /n*  .  .  ■. i  ■ 

x7G^x  =  0,  i=j+l,...,p,  for  some  j=0,l,...,p  then 


X  , ,  £  max  • 

n+1  <0  ~1  .  „  ,  .  b_ . 

i— O^lj • • • Oi 
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PROOF. 

Statement  i)  follows  from  Lemma  B.6  and  the  definitions  of  B_  and 

<'-0 

B, .  Statement  ii)  is  a  simple  application  of  Lemma  B.3  to  part  i).  To 

'XL 

prove  Statement  iii),  observe  that  ' 


m+1 


<5oV 


sup 

xtO 

/V  /V 

x'H=0 


x'B-.x 

/V  /v]_/v 


by  Lemma  B.  4, 


^  sup 
xtO 


2  b.,  .x'G.x 
j=Q  lx~  ~x~ 

3  . 

2  b  x7G.x 

j^_0  /^L~ 


K  • 

^  li 

s  max  - — 

i=0,l,...,j  Oi 


by  Lemma  B.3.  Note  that  in  both  parts  ii)  and  iii)  the  correspondences 
between  Lemma  B.3  and  Lemma  B.7  are  as  follows: 


Lemma  B.3 


x. 

l 


a. 

l 


b. 

l 


Lemma  B.7 
x7G.x 
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LEMMA  B.8.  Let  B  be  an  nXn  symmetric  matrix  and  C  an  nXm  matrix,  then 

'  —————  ■»  -  ■  — 1  ■  —  —  ■  . -  — — —  ——  II*  —  "■ 


PROOF. 


max  |X.(c'BC)|  £  X  (c'c)  max  lx,  (B) 
J  '  maX~~k=l,2,...,n  k~ 


x'c'BCx. 


X.(c'bC)  =  '^—r 
0  ~  x  .x . 


x^C'BCx. 


x.C'Cx. 


x(c'Cx. 


xTx . 


■where  x.  is  the  characteristic  vector  associated  with  X..  (if  Cx.=  0 
~<J  3  ‘ — D 

that  root  is  not  of  interest  in  any  case  since  only  the  nonzero 

characteristic  roots  of  C'BC  are  of  concern,  therefore,  since 


x(C/Cx .  £  0 


x:c'bcx. 

i  - 

x.C'Cx. 

x(c'Cx. 
x'.x.'  ' 


^  sup 
yf=0 

^  /v< 


\fz 


x  (c'c) 

max  ~  ~ 


=  X  (c'c)  max  |x.(B)|.  !!! 
maxv~  _  0  1  k'~'  1  1,1 

k=l,2, . . . ,n 


LEMMA  B.9.  Let  A  and  B  be  nxn  matrices  with  A  nonsingular.  Then 


X.(AB)  =  X.(BA). 


\  240 

LEMMA  B.10.  Let  3  be  an  nXn  matrix.  Then  X.(B)  =  X.(b'). 

-  -  — - -  -  ■■  -  l  ~  1  V-»  ' 

LEMMA  B.ll.  Let  A  be  an  mXn  matrix  and  B  be  an  nXm  matrix.  Then  the 

. —————  . . . .  .  ■  ■  rsu  — «  .  — ».  - 

nonzero  characteristic  roots  of  A3  are  equal  to  the  nonzero  character- 

—  —  **  — ....  "  ■■—■■■  —  — /vrv  -  '  —A—  .....  i  — — — -  — — 

istic  roots  of  BA. 


LEMMA  B.12.  Let  x  and  jr  be  nXl  vectors  and  A  an  nXn  mat-ri-x-.  Then 

s  fe'5)(z'z)  Wi's- 


LEMMA  B.13.  Let  A  be  nXn  positive  semidefinite  and  let  B  be  nXn.  Then 


max  jl  (AB)j  s  1  (A)  max  (B) j . 

v-T  o  n  k  ~~ max'-'  ,  o  „  k~ 
x—x ,n  — X jPj » « « )  n 


PROOF. 

This  lemma  follows  immediately  from  Lemmae  B.8  and  B.9  and  the 
fact  that  there  exists  C  such  that  A=C,C. 


/v  /-w 


LEMMA  B.l4.  For  A  nXn,  max  Jl  (A^)J  =  max  l\(A)l3  » 

k=l,2, ... ,n  K  k=l,2,...,n 

j=lj2 , 3j • • •  • 


LEMMA  B.15.  If  S,  and  are  nxn  matrices  and  t„,  t,  ,  t„  and  y  are  nXl 

-  '>-X  ■  1  1  rJcL  -  - ■  'Mj  'VX  fs^c.  .  r-M 

vectors,  then  the  following"  statements  are  true. 

SgCac-tg)  -  =  (s2-s1)(x-t0)  +  (se-s1)(t0-t1) 


+  * 
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13->  (z-‘2)  'Igts-ie)  -  (£-*!>  VfV 

-  (2-io) ' 1 (£r£i>  +(So-*i)  (z-io> 

+  fe-Jo)  (£2'£iHt0-t1)+(to-t1)  ,(Se-S1)(J0'£1) 

+  £2 • 

IEMMA.  B.l6.  If  S,  and  S„  are  nXn  matrices  and  t„,  t,  and  v  are  nxl 

—  -  r*4J  -  -  /vL  -  ~  - 

vectors,  then  the  folio-wing  statements  are  true, 

Si(£di)  -  Sofe-So5  a  (&-So)(Z-Jo)  +  £i(£odi)  • 

/ 

+  (i0-£i)'£i<x-io) 

+  <x-io) 

+  <io-VW£i)  • 


APPENDIX  C 


A  COMPUTER  PROGRAM  TO  IMPLEMENT 
THE  ITERATIVE  PROCEDURE 

C.I.  Algebra  Used  in  the  Computer  Program 

A  computer  program,  has  been  •written  to  implement  The  Iterative 

/ 

Procedure,  which  was  introduced  in  Chapter  5.  This  section  contains 

a  brief  description  of  the  algebra  used  to  write  the  program.  This 

algebraic  manipulation  was  used  instead  of  more  straightforward 

methods  of  calculation  in  order  to  reduce  core  storage  requirements 

in  the  computer.  The  straightforward  methods  require  the  storage  of 

several  nxn  matrices  in  order  to  compute  the  quantities  required  for 

the  solution  of  the  iterative  equations  (E  and  perhaps  each  of  the  CL ) . 

Since  this  requires  a  great  deal  of  core  for  even  moderate  n,  some 

improvement  is  needed.  The  algebraic  manipulations  given  here  reduce 

the  maximum  dimension  of  the  matrices  which  must  be  stored  to 
P1 

m  =  mq*  This  is  usually  much  smaller  than  n.  For  example,  in 

the  two-way  balanced  layout  described  in  Section  6.1  ,  n=IJK  and 
m=IJ+I+J;  even  for  small  I,J,K  appreciable  savings  can  result.  For 
instance,  if  I,J,K  is  2,3,3  n=l8  but  m=ll  and  if  I,J,K  is  3,6,4 
n=72  but  m=27.  The  basic  result  used  in  these  manipulations  is  due 
to  Woodbury  (1950)  and  is  stated  as  Proposition  C.l.l. 

PROPOSITION  C.l.l.  Let  E  =  a„I  +  UDU',  where  E  and  I  are  nxn  and  U  is 

■  /v  (Ja/  ' . .  —  /*wi  1  ■■  ■"  ■  ■  ■  '  ■ 
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nxm  and  D  _is  mXm  diagonal  and  nonsingular.  Then 


S"1  =  —  (I-U(ct  TD-1+U/U)-1U/) . 

^  CJ  r— '  <*>«#  ( rsa  r~/ 

0 


PROOF:  This  proposition  is  a  special  case  of  a  result  of  Woodbury 
(1950).  HI 


The  matrix  -which  must  be  inverted  for  this  computer  program  may 


be  -written  in  the  above  form,  -where  U  =  [U,  1  and 

~  L~1  ~2 


D  = 


D, 


0 


0 


0 


22 


0 


0 

0 


D 


•where  D.=  cr.  I  . 
~x  1  ~m. 

x 


Thus  m  =  2.  ,  m..  Let  F  =  U^U  be  partitioned  as 
1=_L  i  /v 


F  = 


F 


11 


-lp. 


HP-jl 


F  ^ 
~53.P1 
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•where 


where 


where 


and 


P\  UfU...  Let  E  =  (cJqD  F)  be  partitioned  similarly. 


r-*S^_r w» 


H  =  u'X  = 


H.=  Uf  X.  Let 


Let 
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Then  F  and  E  are  mXm,  w_  is  mXl,  H  is  mXp_,  is  p_Xp^  and  b-  is  p.Xl. 

These  quantities  and  are  the  only  quantities  needed  to  compute  an 
iteration  of  The  Iterative  Procedure.  Note  that  F,  H,  A^,  b^  and 

'“''■O  ^  />-0  'MJ 

may  all  be  computed  as  the  data  are  read  in,  thus  eliminating  any 
need  to  save  the  large  matrix  U.  E  can  then  be  formed  at  the  start  of 

r*/  /-w 

each  iteration  and  the  matrices  Q  =  E  ^F  and  P  =  E  can  be  formed  by 

/V  /-w  /V 

solving  linear  equations.  Thus  it  is  never  necessary  to  actually  invert 

E. 

The  quantities  necessary  to  perform  one  iteration  of  The  Iterative 
Procedure  are  the  elements  of  the  matrix  B(o/.n)  and  the  vector 
£[£(i)»  ~^£(i)^  defined  in  Section' 5*^.  The  iteration  required  is 
then 

=  B  c[a/.\,Q'(CT/.x)], 

~(l+l)  ~  x~(l)  ~ 


•which  can  also  be  solved  ■without  inverting  the  matrix:  B(o^y) .  Thus  it 
is  never  necessary  to  invert  any  matrix  to  use  The  Iterative  Procedure. 
To  obtain  the  elements  of  c,  a  must  first  be  calculated  from  the 


equation 


x'sf  -Sc  , 


/V  /V  /"W  /*W 


or 


A  a  =  b  . 

/V  /V  /V 


But 


A 


=  x'lf-Sc 


r^j  /v 


=  —  X'(l  -  UE“1U,)X 


a0  ~  ~ 


2h6 


Also 


=  —  (X'X  -  X'UE'-Vx) 


=  —  (A  -  H'P>, 

<Jq  /CQ  ^ 


b  =  X;S-15 


=  — ■  X'(l  -  UE"1U/)y 


~  (X'y  -  X/UE_1U,y) 

fT  rs*  /*s/  /^v/  /v  .V 

0 


^  <So  -  ^  • 


o'  is  obtained  as 


a  =  A_1b 


and  the  elements  of  c  are  calculated  from  the 


[cl.  =  (y  -  X  a)'  E"1G.S“1(y  -  X  a),  i=0,l, 

2_  JO  /v  /w  /V  /O  /-s/  /v  '  ^ 


For  i=l,2j . ...  ,pn  this  reduces  to 


(y-Xo)'  E'1G,E~1(y  -  X  a) 

X/  A/  /V  /V  /V  /V  /V 


=  \  (y-Xz)  /(l-UE“LU/)U.Uf(l-UE"1U/)(y-X?) 

^  JO  A/sy  /s/  /w  /v/  -^1  /V  y%/v/  y\/  X/  <n/a/ 


24? 


=  4  [u'(i-UE"1u,)(y->^)]/[u.,(i-uE"1Tj,)(y-^)] 

C.  ^  rv^w<  X->  r^nsj  r^/J_  '  ^  /n^*n/  /s/  JO 


0 


t.'t.  , 


2v  .  u  « 


■where 


t . 

~i 


=  UfCl-UE'-^OCy-Xa) 

/■s^L  /■**>  /  'JO  rw7 


But  if  w  is  defined  by 


w 


=  U7  (y-X5) 

X/  /v/  rw 


So 


H  <* 


Se 


w 

/s/ 


partitioned  the  same  as  -w..,  then 

/S-U 


t.  =  uf(l  -  UE_:LU')(y  -  x  a) 

<Xl  /~VG_  ^  /wv  /v  Js>  #v  r^y 


V 
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■w. 

~i 


E  Q.  .w., 
j=l  ~13^ 


-where  Q,  and  P  are  partitioned  just  like  E  and  H  respectively.  For  i=0 
it  is  necessary  to  calculate 


(y-Xa)'  E"1E~1(y  -  X  5) 

Aw  /vw  /v»  A*/  /w  Aw  /v  /«w 


=  -4  (y  -  x  a)  '(I  -  UE“V)  (I  -  UE-V)  (y  -  X  a) 

£  Xw  />W  /aw  /V/  /v/  WNWSW  /A*  Xw  As/ 


0 


=  4  fa  -  £  €>  '(Z  -  X  £)■ -%  -  X  £)  '«E'V<z  -  S  i) 

ao 

+  (y  -  X  a)'  UE_1TJ/UE"1U,(y  -  X  a)T 

r*m)  /AW  /V  WS/V  /V  /VN/  /V  VJO  AW  /V  J 


-4-  iy  V2^'^^'*'^-2^'^-^)  J  'E"V(y-x£) 

^  Aw  r«/v  /V  /v  /V  Aw  r*/*w  aw  aw  Aj  awnw 

CT0 


+  [E_1U,(y-XJ)3/U/U[E":LU(y  -  x5)]V 

/V  /v  AW  AAW  AW  .V  /Aw/  /V  AW  AAW  J 

=  ■%  -fy'y-SJb'  5  .+  a  c?  -  Pw'z  +  w/Fw| 

^  |/U  AW  A^  /AW  /V  A\)  AW  AW  /V  AW  AWvJ 


0 


where  z  =  E  and  z  is  partitioned  as  w. 

AW  AW  .AW  AW  WV 
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Now  [B]..  mast  be  computed,  i,j=0,l,...,p  .  For  i=j=0, 

^  -**tJ 


[5J00  =  tr  s'1  5"1 


-  UE-  “V)(i  -  UE'  -4o 


,  =  4{ 


tr  I  -  2tr  UE 


“V 


+  tr  UE 


_1u'ue“1u''!- 


2tr  E^'u  +  tr  E~3Tr'UE";Lu'uj 

a/  /*v>  A/v  a/  Ay 


=  4?  {n  “  2tr  Q  +  tr 

ao 


For  j— 0,  i-1,2, . . . ,p^ 


[B].n  =  tr  E-1  E-1  G. 


=  ■%  tr(l  -  UE'^u'Hl  -  UE_:LU')U.U.' 
a0 

=  -4  {tr  U.U.'-  2tr  UE"1U'U.Uf+  tr  UE“1U/UE":LU'U.U.'''' 

C  4  A/V  a/  /Q~|  /Ol  A/N.I  A/  A<v  A/ 

ao 

=  -4  (n  -  2tr  UfUE_1U'u.  +  tr  UfE“4j,UE"1U,U.l- 

iu  l*  A>}_AAJ  A/  A^X  AA^  A/ 

ao 


For  i, 


P-. 


X=1  k=l 


tr  Q..F,  .H,.. 


%}  * 


[B] 


~  10 


=  tr 


e’*1g.e~1g. 

/%/  /w^/w  /vJJ 


=  •—  tr(l  -  UE'-HlOu.UfCl  -  UE_1U')U.U' 

c.  /X/  ss* %/  /xy  '"*•/  r»JX/  /X/  /w*l/x/“j 

ao 


=  tr  U'(l  -  UE^OU-UfCl  -  UE-1tr')U. 

CT0 

=  tr[UfU .-  UfUE_;LU-'u.]'[ufu.-  UfUE-1u'U.]  , 


^  ^kSkj],[Sdj' 


E 

k=l 


The  above  formulae  demonstrate  that  indeed  only  the  matrices  F,  H, 

/x>  /v' 

w^,  and  b^  described  above  need  be  carried  in  the  calculations.  Thus 

MJ  'MJ 

it  is  never  necessary  to  save  the  large  matrices  U  and  X.  This  enables 
problems  of  a  reasonable  size  to  be  run  using  this  program,  which  would 
not  be  possible  if  matrices  of  dimension  n  were  needed. 
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The  above  algorithms  are  used  in  a  sequence  of  subroutines  which 

compute  the  requirements  for  each  iteration.  There  is  one  more  algebraic 

facet  of  the  program  of  •which  the  user  should  be  aware.  In  Section  5*6 

slowly  converging  sequences  which  oscillated  above  and  below  the  final 

value  were  mentioned.  This  program  has  a  feature  designed  to  eliminate 

this  problem.  Whenever  two  iterates  ^  and  cr^  are  sufficiently 

different,  ov .  , .  s  is  formed  by  taking  another  iteration  from  o>.\  and 
i+l )  i ) 

then  averaging.  If  averaging  is  necessary  cr^  is  calculated  from 


and  then  is  formed  as 

~(i+l) 


2(i+i)  =*(2(i)+  2(1))- 


This  feature  eliminates  the  oscillating,  but  care  must  be  taken  that 
it  not  introduce  a  false  convergence  of  its  own.  This  is  done  by 
insisting  that  the  last  iteration  must  be  one  not  involving  the  averag¬ 
ing  process.  The  averaging  and  the  .safeguard  process  have  worked  very 


well  on  sample  problems. 


C.2.  Setup  and  Output  of  the  Computer  Program 

The  computer  program,  is  designed  to  deliver  as  much  freedom  and 
convenience  to  the  user  as  possible.  The  user  must  supply  only  two 
control  cards.  The  first  card  contains  the  number  of  observations,  n, 
the  number  of  levels  of  fixed  factors,  p^,  the  number  of  random  factors, 
p^,  a  number  indicating  how  often  the  iterates  are  to  be  printed  (see 
below)  and  two  yes-no  statements.  The  first  is  yes  if  user  supplies 
initial  guesses  and  no  otherwise.  The  second  is  yes  if  the  short  cut 
notation  for  the  U  matrix  is  used  and  no  otherwise.  (See  below  for 

A/ 

explanation  of  the  short  cut  notation.)  The  second  control  card 
contains  the  number  of  levels  at  which  each  random  factor  appears, 
eu,  i=l,2,...,p1.  (Thus  there  are  p1  numbers  on  the  second  control 
card.") 

After  the  control  cards  come  the  data  cards.  The  data  is  read  in 

by  rows  or  observations.  There  are  two  cards  per  row.  The  first  card 

contains  the  appropriate  row  of  the  U  matrix.  This  may  be  in  the  form 

P1 

of  zeroes  and  ones  (there  will  be  m  =  X.  ..  m.  numbers.)  or  in  short 

'  i=l  1 

cut  notation.  In  short  cut  notation  only  p^  numbers  are  used,  each 
number  stating  in.  which  column  of  TL  the  one  appears;  this  is  a  unique 
description  because  by  definition  there  is  exactly  one  1  in  each  row. 
The  second  card  contains  the  appropriate  row  of  the  X  matrix  and  the 
observation  on  y.  There  roust  be  n  pairs  of  cards,  one  pair  for  each 
observation.  The  last  card  for  a  problem  states  yes  or  no — yes  if 
another  problem  follows  and  no  otherwise. 
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The  output  of  the  program  consists  first  of  the  intermediate 
iterations  the  user  asked  to  be  printed.  If  the  user  placed  a  K  on 
the  first  control  card  in  the  appropriate  position,  then  the  results 
after  each  K  iterations  are  printed.  Then  follow  the  final  results 
and  the  number  of  iterations  required.  The  estimated  large  sample 
covariance  matrices  for  a  and  a  are  also  printed;  these  matrices  are 

estimated  by  [X^E  "^(o)X]  and  2B  ^(o)  respectively. 

This  computer  program  can  handle  fairly  large  problems  with 
relative  ease.  The  size  of  the  largest  matrix  which  should  be  inverted 
(or  set  of  equations  to  be  solved)  is  certainly  less  than  100  and 
probably  closer  to  50.  (The  reasons  are  twofold- -computational 
accuracy  and  time  requirements.)  If  E  were  inverted  directly  the 
size  of  problem  would  be  severely  limited.  However,  using  the  indirect 
methods  above  much  larger  problems  can  be  accommodated.  Since  very 
often  n  is  approximately  a  multiple  of  the  product  of  some  of  the  nu , 
even  when  the  sum  of  the  nu  is  restricted  to  be  small,  n  could  be  large. 
For  instance,  even  if  the  sum  of  the  nn  is  less  than  60,  the  possible 
values  of  n  could  be  well  over  1000.  The  simplicity  of  the  control 
cards  and  data  input  makes  the  program  easy  to  use  even  for  ‘the 
statistician  who  is  unfamiliar  with  computer  programming.  It  seems  to 
be  an  effective  and  efficient  program,  judged  by  its  performance  on 
sample  problems. 

Copies  of  the  program  deck  are  available  from  the  author  upon 
request  along  with  more  detailed  documentation  and  sample  problems. 
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APPENDIX  D 

COMMENTS  ON  THE  PROOF  OF  CONSISTENCY  OF  THE  MAXIMUM 
LIKELIHOOD  ESTIMATES  GIVEN  BY  HARTLEY  AND  RAO 


In  Section  1.1,  it  wa, s  mentioned  that  H.  0.  Hartley  and  J.  N.  K. 

Rao  (1967)  gave  a  proof  of  the  consistency  of  the  maximum  likelihood 
estimates  in  the  mixed  model  of  the  analysis  of  variance.  It  was 
remarked  that  the  theorem  was  true  hut  that  the  assumptions  used  were 
not  all  stated  correctly,  that  the  assumptions  were  restrictive  and 
that  important  details  were  omitted  from  the  proof.  In  this  section 
some  brief  comments  will  be  presented  to  elucidate  these  remarks.  The 
notation  used  will  be  the  notation  used  in  this  paper  rather  than  that 
used  by-Hartley  and  Rao  in  order  to  preserve  continuity. 

The  assumption  stated  incorrectly  occurs  on  page  102  of  the  paper 
and  is  labeled  7-iii.  It  requires  that  for  each  design  in  the  sequence 

-X- 

of  designs  the  matrix  M=  [X:U.  :...:U  1  have  a  basis  W  =  [X:U  1  where 

* 

U  contains  at  least  one  column  from  each  U..  An  assumption  of  this 

sort  is  necessary  to  insure  estimability  of  the  parameters.  That  this 

assumption  is  not  sufficient  can  be  seen  in  the  following  counterexample. 

*  * 

Let  XL  be  any  matrix  such  that  the  basis  of  [X:U,  ]  is  [X:U  ]  where  U 

~j_  ******  ~ 

contains  at  least  p,  columns  from  U-. .  Then  let  U.=  U, ,  j=2,3,. ..  >P-,  • 

_L  «vJL  ~_l  i 


In  such  a  model  instead  of  each  individual  a,  ,ar 


only 


a^+Og+ . . .+<7^  esn  be  estimated,  yet  this  model  satisfies  the  given 
assumption.  One  possible  way  to  correctly  state  the  assumption  is  to 
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use  two  separate  assumptions  like  Assumptions  1.3.4  and  1.3.5- 

The  assumption  which  is  restrictive  is  7.i  (p.  101)  which  requires 
that  all  the  diagonal  elements  of  the  matrix  uf  Lh  (of  which  there  are 
im)  be  less  than  or  equal  to  some  universal  constant  R  for  all  n, 
i=l,2, . . . ,p^.  It  was  claimed  in  this  paper  that  this  required  that 
the  number  of  observations  at  any  level  of  any  random  factor  to  be 
bounded  and  that  it  forced  all  normalizing  sequences  to  be  the  order  of 
n.  That  this  assumption  would  be  restrictive  was  shown  in  Section  1.1 
by  reference  to  Section  6.1.  That  the  claims  are  true  can  be  seen  by 
the  following  argument.  As  Hartley  and  Rac  point  out,  a  diagonal 
element  of  uf  Th  represents  exactly  the  number  of  observations  at  the 
appropriate  level  of  the  l  factor  and  the  sum  of  these  elements  must 
be  n.  (Assumption  1.3.6,  which  was  also  used  by  Hartley  and  Rao  (p.95)> 
can  be  restated  to  say  that  every  observation  is  allocated  to  exactly 
one  level  of  each  factor  and  that  each  level  is  allocated  at  least  one 
observation.)  Thus  there  are  im  such  elements,  each  less  than  R  (which 
demonstrates  the  first  claim)  and  greater  than  zero,  and  adding  up  to 
n;  it  follows  that  nu  ^  n/R,  i=l,2, . . .  ,p^,  which  was  to  be  shown. 

In  an  attempt  to  show  that  the  details  cf  the  proof  of  consistency 

omitted  by  Hartley  and  Rao  may  be  important,  a  short  outline  of  the 

method  of  proof  will  be  attempted.  The  object  is  to  apply  the  method 

1 

of  Wald  (1949)  and  Wolfowitz  (1949)  to  where  L(jf5£)  is  the 

likelihood  function.  One  shows  that  for  each  7)  >  0  and  e  >  0  there  is 
an  h  =  h(T])  and  an  n^  =  n  (T],e)  such  that  h  <  1  and  such  that 
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3o{ 


sup 

9  e  fi(Tl)L(£,6) 


>  h' 


1 


<  e 


where  Q(T|)  =  {8:i|9-9  ||  >  Tj}  and  8  is  the  true  parameter  point.  (That 
is,  the  likelihood  function  cannot  be  large  unless  9  is  near  9_.)  This 

r*JJ 

_is  done  by  shewing  that 


a 


qs^PQ(T))  |  1o§  L(x>£)  -  s  log  L(^®o)  >  log  h} 


<  e 


The  following  are  some  of  the  things  which  must  be  shown  in  order  to 
use  the  Wald-Wolfowitz  argument. 


(!)  ^  loS  L(y5£)  ^  log  L(^,£)]  for  all  £. 


(2)  S  [y  leg  L(y,9)  -  y  log  L(y,  O  ]  <  0  for  all  n. 
On  ~  ~  n 


(3)  lim  <?Q[^  log  L(y,£)  -  ^  log  L(^,fi  )  ]  <  0. 

(4)  Continuity  and  limit  conditions  on  L(y?£)  • 

Hartley  and  Rao  prove  (l)  as  Lemma  7.1  (p.102).  They  state  that  (2) 
and  (4)  follow  from  the  assumptions;  this  is  true,  although  a  great 
deal  of  work  is  necessary  to  prove  (4).  They  do  not  mention  (3)  at  all. 
However,  Condition  (3)  is  very  important.  It  is  necessary  that  the 
expected  values  in  (2)  approach  a  strictly  negative  limit  in  order  to 
insure  that  the  probability  of  any  particular  difference  of  log  likeli- 
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hoods  being  negative  will  converge  to  one.  To  see  that  (l)  and  (2) 

above  are  not  enough,  consider  the  sequence  of  random  variables 

2  ~  7 ?(-  —  .  Then  Z  <£Z  and  SZ  <0  for  all  n  but 

n  n  2'  n  n  n 

n 

P{Zn  <  o]  =  0.8413  for  all  n.  Thus  a  condition  like  Condition  (3)  must 

be  proved;  to  prove  this  condition  it  is  necessary  to  use  arguments  of 
the  same  type  used  in  Section  A. 4.1  to  show  that  the  matrix  J  was 

A/ 

positive  definite.  It  was  shown  in  Section  1.1  that  arguments  about 
the  positive  definiteness  of  J  were  closely  related  to  problems  of 
degenerating  distributions.  That  such  arguments  must  also  be  taken 
into  account  in  this  version  of  a  consistency  proof  is  entirely  reason¬ 
able  since  this  result  is  a  stronger  result  than  Theorem  4.4.1  and  so 
this  version  should  take  into  account  any  difficulties  that  arise  there. 
Condition  (3)  can  be  proved  using  all  the  assumptions — including  7.i, 
but  it  is  not  true  when  different  parameters  converge  at  different 
rates.  Thus  this  proof  of  consistency  can  be  used  under  all  of  Hartley 
and  Rao’s  assumptions  but  will  not  work  in  the  more  general  cases  be¬ 
cause  the  limit  in  (3)  degenerates  to  zero. 

Interestingly  enough,  although  the  assumptions  7.1-7. iii  are 
necessary  to  prove  consistency,  they  are  not  necessary  to  prove  (l). 

In  fact  the  only  necessities  are  that  the  elements  of  X  be  bounded  and 

A/ 

that  £  be  an  interior  point  of  the  parameter  space.  To  prove  (l)  it. 
suffices  to  prove,  as  Hartley  and  Rao  did  in  Lemma  7.1  (p.102),  that 

Var0^n  log  “  °(„)  •  But 

\ 


258 


iog  L(£,0)  =  |  log  2tt  -  logjsj  -  iCjr-X*) 


Therefore, 


Var0[log  L(y,e)]  -  2tr(E"  EQ)  +  , 

by  Lemma  3.1, 


•X  (E^Ej2 
max  ~  ~0' 


by  Proposition  A. 3. 4, 


op.  v2 


^  2  max  (—■)  [n+2(a-QfJ'(a.O 
i=0,l,...,p  '  CTi  ' 


by  Lemma  B.7.  But 


max'~  ~-0 
by  Lemma- B. 8 


X  (x'El^X)  <:  X  (x'x)  X  (E-1) 

max'~  ''-0  ~  max'~  ~  max'^D 


=  X  (x'x)  ~7=tt 

max' - X  .  (E. ) 

min'~0 


X  (X-1E  X) ] 
max  ~  ~0~'  J 


5  x  (x'x)  — 

wov'_.  _  /  «• 


max 


00 


oecause  X^^(E^)  S:  c^.  But  if  the  elements  of  X  are  bounded  then 
X^yQc'x)  £  Kn  for  some  constant  K.-  Therefore 
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Var0[n  logL^>£^ 


which  was  to  be  proved. 
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